date:20230914

Re: [PATCH v2 07/20] audio: add Apple Sound Chip (ASC) emulation

2023-09-14 Thread Volker Rümelin

Am 09.09.23 um 11:48 schrieb Mark Cave-Ayland:
> The Apple Sound Chip was primarily used by the Macintosh II to generate sound
> in hardware which was previously handled by the toolbox ROM with software
> interrupts.
>
> Implement both the standard ASC and also the enhanced ASC (EASC) functionality
> which is used in the Quadra 800.
>
> Note that whilst real ASC hardware uses AUDIO_FORMAT_S8, this implementation 
> uses
> AUDIO_FORMAT_U8 instead because AUDIO_FORMAT_S8 is rarely used and not 
> supported
> by some audio backends like PulseAudio and DirectSound when played directly 
> with
> -audiodev out.mixing-engine=off.
>
> Co-developed-by: Laurent Vivier 
> Co-developed-by: Volker Rümelin 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  MAINTAINERS|   2 +
>  hw/audio/Kconfig   |   3 +
>  hw/audio/asc.c | 699 +
>  hw/audio/meson.build   |   1 +
>  hw/audio/trace-events  |  10 +
>  hw/m68k/Kconfig|   1 +
>  include/hw/audio/asc.h |  84 +
>  7 files changed, 800 insertions(+)
>  create mode 100644 hw/audio/asc.c
>  create mode 100644 include/hw/audio/asc.h

Hi Mark,

the function generate_fifo() has four issues. Only the first one
is noticeable.

1. The calculation of the variable limit assumes generate_fifo()
generates one output sample from every input byte. This is correct
for the raw mode, but not for the CD-XA BRR mode. This mode
generates 28 output samples from 15 input bytes. This is the
reason for the stuttering end of a CD-XA BRR mode sound. Every
generate_fifo() call generates approximately only half of the
possible samples when the fifo bytes are running low.

2. generate_fifo() doesn't generate the last output sample from
a CD-XA BRR mode sound. The last sample is generated from internal
state and the code will not be called without at least one byte
in the fifo.

3. It's not necessary to wait for a complete 15 byte packet in
CD-XA BRR mode. Audio playback devices should write all
requested samples immediately if possible.

4. The saturation function in CD-XA BRR mode works with 16 bit
integers. It should saturate at +32767 and -32768.

Since I think a few lines of code explain the issues better
than my words, I've attached a patch below.

With best regards,
Volker

> +static int generate_fifo(ASCState *s, int maxsamples)
> +{
> +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +uint8_t *buf = s->mixbuf;
> +int i, limit, count = 0;
> +
> +limit = MIN(MAX(s->fifos[0].cnt, s->fifos[1].cnt), maxsamples);
> +while (count < limit) {
> +uint8_t val;
> +int16_t d, f0, f1;
> +int32_t t;
> +int shift, filter;
> +bool hasdata = true;
> +
> +for (i = 0; i < 2; i++) {
> +ASCFIFOState *fs = &s->fifos[i];
> +
> +switch (fs->extregs[ASC_EXTREGS_FIFOCTRL] & 0x83) {
> +case 0x82:
> +/*
> + * CD-XA BRR mode: exit if there isn't enough data in the 
> FIFO
> + * for a complete 15 byte packet
> + */
> +if (fs->xa_cnt == -1 && fs->cnt < 15) {
> +hasdata = false;
> +continue;
> +}
> +
> +if (fs->xa_cnt == -1) {
> +/* Start of packet, get flags */
> +fs->xa_flags = asc_fifo_get(fs);
> +fs->xa_cnt = 0;
> +}
> +
> +shift = fs->xa_flags & 0xf;
> +filter = fs->xa_flags >> 4;
> +f0 = (int8_t)fs->extregs[ASC_EXTREGS_CDXA_DECOMP_FILT +
> + (filter << 1) + 1];
> +f1 = (int8_t)fs->extregs[ASC_EXTREGS_CDXA_DECOMP_FILT +
> + (filter << 1)];
> +if ((fs->xa_cnt & 1) == 0) {
> +fs->xa_val = asc_fifo_get(fs);
> +d = (fs->xa_val & 0xf) << 12;
> +} else {
> +d = (fs->xa_val & 0xf0) << 8;
> +}
> +t = (d >> shift) + (((fs->xa_last[0] * f0) +
> + (fs->xa_last[1] * f1) + 32) >> 6);
> +if (t < -32768) {
> +t = -32768;
> +} else if (t > 32768) {
> +t = 32768;
> +}
> +
> +/*
> + * CD-XA BRR generates 16-bit signed output, so convert to
> + * 8-bit before writing to buffer. Does real hardware do the
> + * same?
> + */
> +buf[count * 2 + i] = (uint8_t)(t / 256) ^ 0x80;
> +fs->xa_cnt++;
> +
> +fs->xa_last[1] = fs->xa_last[0];
> +fs->xa_last[0] = (int16_t)t;
> +
> +if (fs->xa_cnt == 28) {
> +/* End of packet */
> +fs->xa_cnt = -1;
> +}
> +break;

[PATCH v4 03/21] softmmu: Fix CPUSTATE.nr_cores' calculation

2023-09-14 Thread Zhao Liu

From: Zhuocheng Ding 

>From CPUState.nr_cores' comment, it represents "number of cores within
this CPU package".

After 003f230e37d7 ("machine: Tweak the order of topology members in
struct CpuTopology"), the meaning of smp.cores changed to "the number of
cores in one die", but this commit missed to change CPUState.nr_cores'
calculation, so that CPUState.nr_cores became wrong and now it
misses to consider numbers of clusters and dies.

At present, only i386 is using CPUState.nr_cores.

But as for i386, which supports die level, the uses of CPUState.nr_cores
are very confusing:

Early uses are based on the meaning of "cores per package" (before die
is introduced into i386), and later uses are based on "cores per die"
(after die's introduction).

This difference is due to that commit a94e1428991f ("target/i386: Add
CPUID.1F generation support for multi-dies PCMachine") misunderstood
that CPUState.nr_cores means "cores per die" when calculated
CPUID.1FH.01H:EBX. After that, the changes in i386 all followed this
wrong understanding.

With the influence of 003f230e37d7 and a94e1428991f, for i386 currently
the result of CPUState.nr_cores is "cores per die", thus the original
uses of CPUState.cores based on the meaning of "cores per package" are
wrong when multiple dies exist:
1. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.01H:EBX[bits 23:16] is
   incorrect because it expects "cpus per package" but now the
   result is "cpus per die".
2. In cpu_x86_cpuid() of target/i386/cpu.c, for all leaves of CPUID.04H:
   EAX[bits 31:26] is incorrect because they expect "cpus per package"
   but now the result is "cpus per die". The error not only impacts the
   EAX calculation in cache_info_passthrough case, but also impacts other
   cases of setting cache topology for Intel CPU according to cpu
   topology (specifically, the incoming parameter "num_cores" expects
   "cores per package" in encode_cache_cpuid4()).
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.0BH.01H:EBX[bits
   15:00] is incorrect because the EBX of 0BH.01H (core level) expects
   "cpus per package", which may be different with 1FH.01H (The reason
   is 1FH can support more levels. For QEMU, 1FH also supports die,
   1FH.01H:EBX[bits 15:00] expects "cpus per die").
4. In cpu_x86_cpuid() of target/i386/cpu.c, when CPUID.8001H is
   calculated, here "cpus per package" is expected to be checked, but in
   fact, now it checks "cpus per die". Though "cpus per die" also works
   for this code logic, this isn't consistent with AMD's APM.
5. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.8008H:ECX expects
   "cpus per package" but it obtains "cpus per die".
6. In simulate_rdmsr() of target/i386/hvf/x86_emu.c, in
   kvm_rdmsr_core_thread_count() of target/i386/kvm/kvm.c, and in
   helper_rdmsr() of target/i386/tcg/sysemu/misc_helper.c,
   MSR_CORE_THREAD_COUNT expects "cpus per package" and "cores per
   package", but in these functions, it obtains "cpus per die" and
   "cores per die".

On the other hand, these uses are correct now (they are added in/after
a94e1428991f):
1. In cpu_x86_cpuid() of target/i386/cpu.c, topo_info.cores_per_die
   meets the actual meaning of CPUState.nr_cores ("cores per die").
2. In cpu_x86_cpuid() of target/i386/cpu.c, vcpus_per_socket (in CPUID.
   04H's calculation) considers number of dies, so it's correct.
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.1FH.01H:EBX[bits
   15:00] needs "cpus per die" and it gets the correct result, and
   CPUID.1FH.02H:EBX[bits 15:00] gets correct "cpus per package".

When CPUState.nr_cores is correctly changed to "cores per package" again
, the above errors will be fixed without extra work, but the "currently"
correct cases will go wrong and need special handling to pass correct
"cpus/cores per die" they want.

Fix CPUState.nr_cores' calculation to fit the original meaning "cores
per package", as well as changing calculation of topo_info.cores_per_die,
vcpus_per_socket and CPUID.1FH.

Fixes: a94e1428991f ("target/i386: Add CPUID.1F generation support for 
multi-dies PCMachine")
Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in struct 
CpuTopology")
Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
---
Changes since v3:
 * Describe changes in imperative mood. (Babu)
 * Fix spelling typo. (Babu)
 * Split the comment change into a separate patch. (Xiaoyao)

Changes since v2:
 * Use wrapped helper to get cores per socket in qemu_init_vcpu().

Changes since v1:
 * Add comment for nr_dies in CPUX86State. (Yanan)
---
 softmmu/cpus.c| 2 +-
 target/i386/cpu.c | 9 -
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 0848e0dbdb3f..fa8239c217ff 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -624,7 +624,7 @@ void qemu_init_vcpu(CPUState *cpu)
 {
 MachineState *ms = MACHINE(qdev_get_machine());
 
-cpu->nr_cores = ms->smp.cores;
+cpu->nr_cores = machine_topo_get_co

[PATCH v4 02/21] tests: Rename test-x86-cpuid.c to test-x86-topo.c

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

The tests in this file actually test the APIC ID combinations.
Rename to test-x86-topo.c to make its name more in line with its
actual content.

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Michael S. Tsirkin 
---
Changes since v3:
 * Modify the description in commit message to emphasize this file tests
   APIC ID combinations. (Babu)

Changes since v1:
 * Rename test-x86-apicid.c to test-x86-topo.c. (Yanan)
---
 MAINTAINERS  | 2 +-
 tests/unit/meson.build   | 4 ++--
 tests/unit/{test-x86-cpuid.c => test-x86-topo.c} | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)
 rename tests/unit/{test-x86-cpuid.c => test-x86-topo.c} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 00562f924f7a..eaaa041804b2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1721,7 +1721,7 @@ F: include/hw/southbridge/ich9.h
 F: include/hw/southbridge/piix.h
 F: hw/isa/apm.c
 F: include/hw/isa/apm.h
-F: tests/unit/test-x86-cpuid.c
+F: tests/unit/test-x86-topo.c
 F: tests/qtest/test-x86-cpuid-compat.c
 
 PC Chipset
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 0299ef6906cc..bb6f8177341d 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -21,8 +21,8 @@ tests = {
   'test-opts-visitor': [testqapi],
   'test-visitor-serialization': [testqapi],
   'test-bitmap': [],
-  # all code tested by test-x86-cpuid is inside topology.h
-  'test-x86-cpuid': [],
+  # all code tested by test-x86-topo is inside topology.h
+  'test-x86-topo': [],
   'test-cutils': [],
   'test-div128': [],
   'test-shift128': [],
diff --git a/tests/unit/test-x86-cpuid.c b/tests/unit/test-x86-topo.c
similarity index 99%
rename from tests/unit/test-x86-cpuid.c
rename to tests/unit/test-x86-topo.c
index bfabc0403a1a..2b104f86d7c2 100644
--- a/tests/unit/test-x86-cpuid.c
+++ b/tests/unit/test-x86-topo.c
@@ -1,5 +1,5 @@
 /*
- *  Test code for x86 CPUID and Topology functions
+ *  Test code for x86 APIC ID and Topology functions
  *
  *  Copyright (c) 2012 Red Hat Inc.
  *
-- 
2.34.1

[PATCH v4 01/21] i386: Fix comment style in topology.h

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

For function comments in this file, keep the comment style consistent
with other files in the directory.

Signed-off-by: Zhao Liu 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Yanan Wang 
Reviewed-by: Xiaoyao Li 
Acked-by: Michael S. Tsirkin 
---
Changes since v3:
 * Optimized the description in commit message: Change "with other
   places" to "with other files in the directory". (Babu)
---
 include/hw/i386/topology.h | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index 81573f6cfde0..5a19679f618b 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -24,7 +24,8 @@
 #ifndef HW_I386_TOPOLOGY_H
 #define HW_I386_TOPOLOGY_H
 
-/* This file implements the APIC-ID-based CPU topology enumeration logic,
+/*
+ * This file implements the APIC-ID-based CPU topology enumeration logic,
  * documented at the following document:
  *   Intel® 64 Architecture Processor Topology Enumeration
  *   
http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/
@@ -41,7 +42,8 @@
 
 #include "qemu/bitops.h"
 
-/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support
+/*
+ * APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support
  */
 typedef uint32_t apic_id_t;
 
@@ -58,8 +60,7 @@ typedef struct X86CPUTopoInfo {
 unsigned threads_per_core;
 } X86CPUTopoInfo;
 
-/* Return the bit width needed for 'count' IDs
- */
+/* Return the bit width needed for 'count' IDs */
 static unsigned apicid_bitwidth_for_count(unsigned count)
 {
 g_assert(count >= 1);
@@ -67,15 +68,13 @@ static unsigned apicid_bitwidth_for_count(unsigned count)
 return count ? 32 - clz32(count) : 0;
 }
 
-/* Bit width of the SMT_ID (thread ID) field on the APIC ID
- */
+/* Bit width of the SMT_ID (thread ID) field on the APIC ID */
 static inline unsigned apicid_smt_width(X86CPUTopoInfo *topo_info)
 {
 return apicid_bitwidth_for_count(topo_info->threads_per_core);
 }
 
-/* Bit width of the Core_ID field
- */
+/* Bit width of the Core_ID field */
 static inline unsigned apicid_core_width(X86CPUTopoInfo *topo_info)
 {
 return apicid_bitwidth_for_count(topo_info->cores_per_die);
@@ -87,8 +86,7 @@ static inline unsigned apicid_die_width(X86CPUTopoInfo 
*topo_info)
 return apicid_bitwidth_for_count(topo_info->dies_per_pkg);
 }
 
-/* Bit offset of the Core_ID field
- */
+/* Bit offset of the Core_ID field */
 static inline unsigned apicid_core_offset(X86CPUTopoInfo *topo_info)
 {
 return apicid_smt_width(topo_info);
@@ -100,14 +98,14 @@ static inline unsigned apicid_die_offset(X86CPUTopoInfo 
*topo_info)
 return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
 }
 
-/* Bit offset of the Pkg_ID (socket ID) field
- */
+/* Bit offset of the Pkg_ID (socket ID) field */
 static inline unsigned apicid_pkg_offset(X86CPUTopoInfo *topo_info)
 {
 return apicid_die_offset(topo_info) + apicid_die_width(topo_info);
 }
 
-/* Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
+/*
+ * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
  *
  * The caller must make sure core_id < nr_cores and smt_id < nr_threads.
  */
@@ -120,7 +118,8 @@ static inline apic_id_t 
x86_apicid_from_topo_ids(X86CPUTopoInfo *topo_info,
topo_ids->smt_id;
 }
 
-/* Calculate thread/core/package IDs for a specific topology,
+/*
+ * Calculate thread/core/package IDs for a specific topology,
  * based on (contiguous) CPU index
  */
 static inline void x86_topo_ids_from_idx(X86CPUTopoInfo *topo_info,
@@ -137,7 +136,8 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo 
*topo_info,
 topo_ids->smt_id = cpu_index % nr_threads;
 }
 
-/* Calculate thread/core/package IDs for a specific topology,
+/*
+ * Calculate thread/core/package IDs for a specific topology,
  * based on APIC ID
  */
 static inline void x86_topo_ids_from_apicid(apic_id_t apicid,
@@ -155,7 +155,8 @@ static inline void x86_topo_ids_from_apicid(apic_id_t 
apicid,
 topo_ids->pkg_id = apicid >> apicid_pkg_offset(topo_info);
 }
 
-/* Make APIC ID for the CPU 'cpu_index'
+/*
+ * Make APIC ID for the CPU 'cpu_index'
  *
  * 'cpu_index' is a sequential, contiguous ID for the CPU.
  */
-- 
2.34.1

[PATCH v4 05/21] i386/cpu: Fix i/d-cache topology to core level for Intel CPU

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

For i-cache and d-cache, current QEMU hardcodes the maximum IDs for CPUs
sharing cache (CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits
25:14]) to 0, and this means i-cache and d-cache are shared in the SMT
level.

This is correct if there's single thread per core, but is wrong for the
hyper threading case (one core contains multiple threads) since the
i-cache and d-cache are shared in the core level other than SMT level.

For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache
Information for cpuid 0x801D") has already introduced i/d cache
topology as core level by default.

Therefore, in order to be compatible with both multi-threaded and
single-threaded situations, we should set i-cache and d-cache be shared
at the core level by default.

This fix changes the default i/d cache topology from per-thread to
per-core. Potentially, this change in L1 cache topology may affect the
performance of the VM if the user does not specifically specify the
topology or bind the vCPU. However, the way to achieve optimal
performance should be to create a reasonable topology and set the
appropriate vCPU affinity without relying on QEMU's default topology
structure.

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo 
Signed-off-by: Zhao Liu 
Reviewed-by: Xiaoyao Li 
---
Changes since v3:
 * Change the description of current i/d cache encoding status to avoid
   misleading to "architectural rules". (Xiaoyao)

Changes since v1:
 * Split this fix from the patch named "i386/cpu: Fix number of
   addressable IDs in CPUID.04H".
 * Add the explanation of the impact on performance. (Xiaoyao)
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 709c055c8468..c5c2a045e032 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6108,12 +6108,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 switch (count) {
 case 0: /* L1 dcache info */
 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
-1, cs->nr_cores,
+cs->nr_threads, cs->nr_cores,
 eax, ebx, ecx, edx);
 break;
 case 1: /* L1 icache info */
 encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
-1, cs->nr_cores,
+cs->nr_threads, cs->nr_cores,
 eax, ebx, ecx, edx);
 break;
 case 2: /* L2 cache info */
-- 
2.34.1

[PATCH v4 07/21] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

In cpu_x86_cpuid(), there are many variables in representing the cpu
topology, e.g., topo_info, cs->nr_cores/cs->nr_threads.

Since the names of cs->nr_cores/cs->nr_threads does not accurately
represent its meaning, the use of cs->nr_cores/cs->nr_threads is prone
to confusion and mistakes.

And the structure X86CPUTopoInfo names its members clearly, thus the
variable "topo_info" should be preferred.

In addition, in cpu_x86_cpuid(), to uniformly use the topology variable,
replace env->dies with topo_info.dies_per_pkg as well.

Suggested-by: Robert Hoo 
Signed-off-by: Zhao Liu 
---
Changes since v3:
 * Fix typo. (Babu)

Changes since v1:
 * Extract cores_per_socket from the code block and use it as a local
   variable for cpu_x86_cpuid(). (Yanan)
 * Remove vcpus_per_socket variable and use cpus_per_pkg directly.
   (Yanan)
 * Replace env->dies with topo_info.dies_per_pkg in cpu_x86_cpuid().
---
 target/i386/cpu.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a3d67c1762c0..0e9a33034026 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6012,11 +6012,16 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 uint32_t limit;
 uint32_t signature[3];
 X86CPUTopoInfo topo_info;
+uint32_t cores_per_pkg;
+uint32_t cpus_per_pkg;
 
 topo_info.dies_per_pkg = env->nr_dies;
 topo_info.cores_per_die = cs->nr_cores / env->nr_dies;
 topo_info.threads_per_core = cs->nr_threads;
 
+cores_per_pkg = topo_info.cores_per_die * topo_info.dies_per_pkg;
+cpus_per_pkg = cores_per_pkg * topo_info.threads_per_core;
+
 /* Calculate & apply limits for different index ranges */
 if (index >= 0xC000) {
 limit = env->cpuid_xlevel2;
@@ -6052,8 +6057,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx |= CPUID_EXT_OSXSAVE;
 }
 *edx = env->features[FEAT_1_EDX];
-if (cs->nr_cores * cs->nr_threads > 1) {
-*ebx |= (cs->nr_cores * cs->nr_threads) << 16;
+if (cpus_per_pkg > 1) {
+*ebx |= cpus_per_pkg << 16;
 *edx |= CPUID_HT;
 }
 if (!cpu->enable_pmu) {
@@ -6090,8 +6095,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  */
 if (*eax & 31) {
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
-int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
-if (cs->nr_cores > 1) {
+
+if (cores_per_pkg > 1) {
 int addressable_cores_offset =
 apicid_pkg_offset(&topo_info) -
 apicid_core_offset(&topo_info);
@@ -6099,7 +6104,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *eax &= ~0xFC00;
 *eax |= (1 << (addressable_cores_offset - 1)) << 26;
 }
-if (host_vcpus_per_cache > vcpus_per_socket) {
+if (host_vcpus_per_cache > cpus_per_pkg) {
 int pkg_offset = apicid_pkg_offset(&topo_info);
 
 *eax &= ~0x3FFC000;
@@ -6244,12 +6249,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 switch (count) {
 case 0:
 *eax = apicid_core_offset(&topo_info);
-*ebx = cs->nr_threads;
+*ebx = topo_info.threads_per_core;
 *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
 break;
 case 1:
 *eax = apicid_pkg_offset(&topo_info);
-*ebx = cs->nr_cores * cs->nr_threads;
+*ebx = cpus_per_pkg;
 *ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
 break;
 default:
@@ -6270,7 +6275,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x1F:
 /* V2 Extended Topology Enumeration Leaf */
-if (env->nr_dies < 2) {
+if (topo_info.dies_per_pkg < 2) {
 *eax = *ebx = *ecx = *edx = 0;
 break;
 }
@@ -6280,7 +6285,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 switch (count) {
 case 0:
 *eax = apicid_core_offset(&topo_info);
-*ebx = cs->nr_threads;
+*ebx = topo_info.threads_per_core;
 *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
 break;
 case 1:
@@ -6290,7 +6295,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 2:
 *eax = apicid_pkg_offset(&topo_info);
-*ebx = cs->nr_cores * cs->nr_threads;
+*ebx = cpus_per_pkg;
 *ecx |= CPUID_TOPOLOGY_LEVEL_DIE;
 break;
 default:
@@ -6515,7 +6520,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32

[PATCH v4 14/21] i386/cpu: Introduce cluster-id to X86CPU

2023-09-14 Thread Zhao Liu

From: Zhuocheng Ding 

Introduce cluster-id other than module-id to be consistent with
CpuInstanceProperties.cluster-id, and this avoids the confusion
of parameter names when hotplugging.

Following the legacy smp check rules, also add the cluster_id validity
into x86_cpu_pre_plug().

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Acked-by: Michael S. Tsirkin 
---
Changes since v3:
 * Use the imperative in the commit message. (Babu)
---
 hw/i386/x86.c | 33 +
 target/i386/cpu.c |  2 ++
 target/i386/cpu.h |  1 +
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 5b05dbdedbff..a93f0771d97b 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -325,6 +325,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 cpu->die_id = 0;
 }
 
+/*
+ * cluster-id was optional in QEMU 8.0 and older, so keep it optional
+ * if there's only one cluster per die.
+ */
+if (cpu->cluster_id < 0 && ms->smp.clusters == 1) {
+cpu->cluster_id = 0;
+}
+
 if (cpu->socket_id < 0) {
 error_setg(errp, "CPU socket-id is not set");
 return;
@@ -341,6 +349,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
cpu->die_id, ms->smp.dies - 1);
 return;
 }
+if (cpu->cluster_id < 0) {
+error_setg(errp, "CPU cluster-id is not set");
+return;
+} else if (cpu->cluster_id > ms->smp.clusters - 1) {
+error_setg(errp, "Invalid CPU cluster-id: %u must be in range 
0:%u",
+   cpu->cluster_id, ms->smp.clusters - 1);
+return;
+}
 if (cpu->core_id < 0) {
 error_setg(errp, "CPU core-id is not set");
 return;
@@ -360,16 +376,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 topo_ids.pkg_id = cpu->socket_id;
 topo_ids.die_id = cpu->die_id;
+topo_ids.module_id = cpu->cluster_id;
 topo_ids.core_id = cpu->core_id;
 topo_ids.smt_id = cpu->thread_id;
-
-/*
- * TODO: This is the temporary initialization for topo_ids.module_id to
- * avoid "maybe-uninitialized" compilation errors. Will remove when
- * X86CPU supports cluster_id.
- */
-topo_ids.module_id = 0;
-
 cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
 }
 
@@ -416,6 +425,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 }
 cpu->die_id = topo_ids.die_id;
 
+if (cpu->cluster_id != -1 && cpu->cluster_id != topo_ids.module_id) {
+error_setg(errp, "property cluster-id: %u doesn't match set apic-id:"
+" 0x%x (cluster-id: %u)", cpu->cluster_id, cpu->apic_id,
+topo_ids.module_id);
+return;
+}
+cpu->cluster_id = topo_ids.module_id;
+
 if (cpu->core_id != -1 && cpu->core_id != topo_ids.core_id) {
 error_setg(errp, "property core-id: %u doesn't match set apic-id:"
 " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f0ddb253b6b5..d8c5e774cf95 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7907,12 +7907,14 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, 0),
 DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, 0),
 DEFINE_PROP_INT32("core-id", X86CPU, core_id, 0),
+DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, 0),
 DEFINE_PROP_INT32("die-id", X86CPU, die_id, 0),
 DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, 0),
 #else
 DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, UNASSIGNED_APIC_ID),
 DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, -1),
 DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
+DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, -1),
 DEFINE_PROP_INT32("die-id", X86CPU, die_id, -1),
 DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
 #endif
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 54019e82fdb4..fa1452380882 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2059,6 +2059,7 @@ struct ArchCPU {
 int32_t node_id; /* NUMA node this CPU belongs to */
 int32_t socket_id;
 int32_t die_id;
+int32_t cluster_id;
 int32_t core_id;
 int32_t thread_id;
 
-- 
2.34.1

[PATCH v4 06/21] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
nearest power-of-2 integer.

The nearest power-of-2 integer can be calculated by pow2ceil() or by
using APIC ID offset (like L3 topology using 1 << die_offset [3]).

But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
are associated with APIC ID. For example, in linux kernel, the field
"num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
matched with actual core numbers and it's calculated by:
"(1 << (pkg_offset - core_offset)) - 1".

Therefore the offset of APIC ID should be preferred to calculate nearest
power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
31:26]:
1. d/i cache is shared in a core, 1 << core_offset should be used
   instand of "cs->nr_threads" in encode_cache_cpuid4() for
   CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
2. L2 cache is supposed to be shared in a core as for now, thereby
   1 << core_offset should also be used instand of "cs->nr_threads" in
   encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
   calculated with the bit width between the Package and SMT levels in
   the APIC ID (1 << (pkg_offset - core_offset) - 1).

In addition, use APIC ID offset to replace "pow2ceil()" for
cache_info_passthrough case.

[1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor 
cores meets the spec")
[2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical 
processors sharing cache")
[3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset 
support")

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo 
Signed-off-by: Zhao Liu 
---
Changes since v3:
 * Fix compile warnings. (Babu)
 * Fix spelling typo.

Changes since v1:
 * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
   case. (Yanan)
 * Split the L1 cache fix into a separate patch.
 * Rename the title of this patch (the original is "i386/cpu: Fix number
   of addressable IDs in CPUID.04H").
---
 target/i386/cpu.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index c5c2a045e032..a3d67c1762c0 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6009,7 +6009,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 {
 X86CPU *cpu = env_archcpu(env);
 CPUState *cs = env_cpu(env);
-uint32_t die_offset;
 uint32_t limit;
 uint32_t signature[3];
 X86CPUTopoInfo topo_info;
@@ -6093,39 +6092,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
 if (cs->nr_cores > 1) {
+int addressable_cores_offset =
+apicid_pkg_offset(&topo_info) -
+apicid_core_offset(&topo_info);
+
 *eax &= ~0xFC00;
-*eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
+*eax |= (1 << (addressable_cores_offset - 1)) << 26;
 }
 if (host_vcpus_per_cache > vcpus_per_socket) {
+int pkg_offset = apicid_pkg_offset(&topo_info);
+
 *eax &= ~0x3FFC000;
-*eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
+*eax |= (1 << (pkg_offset - 1)) << 14;
 }
 }
 } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
 *eax = *ebx = *ecx = *edx = 0;
 } else {
 *eax = 0;
+int addressable_cores_offset = apicid_pkg_offset(&topo_info) -
+   apicid_core_offset(&topo_info);
+int core_offset, die_offset;
+
 switch (count) {
 case 0: /* L1 dcache info */
+core_offset = apicid_core_offset(&topo_info);
 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
-cs->nr_threads, cs->nr_cores,
+(1 << core_offset),
+(1 << addressable_cores_offset),
 eax, ebx, ecx, edx);
 break;
 case 1: /* L1 icache info */
+core_offset = apicid_core_offset(&topo_info);
 encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
-cs->nr_threads, cs->nr_cores,
+(1 << core_offset),
+

[PATCH v4 08/21] i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB]

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

CPUID[0xB] defines SMT, Core and Invalid types, and this leaf is shared
by Intel and AMD CPUs.

But for extended topology levels, Intel CPU (in CPUID[0x1F]) and AMD CPU
(in CPUID[0x8026]) have the different definitions with different
enumeration values.

Though CPUID[0x8026] hasn't been implemented in QEMU, to avoid
possible misunderstanding, split topology types of CPUID[0x1F] from the
definitions of CPUID[0xB] and introduce CPUID[0x1F]-specific topology
types.

Signed-off-by: Zhao Liu 
---
Changes since v3:
 * New commit to prepare to refactor CPUID[0x1F] encoding.
---
 target/i386/cpu.c | 14 +++---
 target/i386/cpu.h | 13 +
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0e9a33034026..88ccc9df9118 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6250,17 +6250,17 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 case 0:
 *eax = apicid_core_offset(&topo_info);
 *ebx = topo_info.threads_per_core;
-*ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
+*ecx |= CPUID_B_ECX_TOPO_LEVEL_SMT << 8;
 break;
 case 1:
 *eax = apicid_pkg_offset(&topo_info);
 *ebx = cpus_per_pkg;
-*ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
+*ecx |= CPUID_B_ECX_TOPO_LEVEL_CORE << 8;
 break;
 default:
 *eax = 0;
 *ebx = 0;
-*ecx |= CPUID_TOPOLOGY_LEVEL_INVALID;
+*ecx |= CPUID_B_ECX_TOPO_LEVEL_INVALID << 8;
 }
 
 assert(!(*eax & ~0x1f));
@@ -6286,22 +6286,22 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 case 0:
 *eax = apicid_core_offset(&topo_info);
 *ebx = topo_info.threads_per_core;
-*ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
+*ecx |= CPUID_1F_ECX_TOPO_LEVEL_SMT << 8;
 break;
 case 1:
 *eax = apicid_die_offset(&topo_info);
 *ebx = topo_info.cores_per_die * topo_info.threads_per_core;
-*ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
+*ecx |= CPUID_1F_ECX_TOPO_LEVEL_CORE << 8;
 break;
 case 2:
 *eax = apicid_pkg_offset(&topo_info);
 *ebx = cpus_per_pkg;
-*ecx |= CPUID_TOPOLOGY_LEVEL_DIE;
+*ecx |= CPUID_1F_ECX_TOPO_LEVEL_DIE << 8;
 break;
 default:
 *eax = 0;
 *ebx = 0;
-*ecx |= CPUID_TOPOLOGY_LEVEL_INVALID;
+*ecx |= CPUID_1F_ECX_TOPO_LEVEL_INVALID << 8;
 }
 assert(!(*eax & ~0x1f));
 *ebx &= 0x; /* The count doesn't need to be reliable. */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 70eb3bc23eb8..97113ad3d002 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1009,10 +1009,15 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord 
w,
 #define CPUID_MWAIT_EMX (1U << 0) /* enumeration supported */
 
 /* CPUID[0xB].ECX level types */
-#define CPUID_TOPOLOGY_LEVEL_INVALID  (0U << 8)
-#define CPUID_TOPOLOGY_LEVEL_SMT  (1U << 8)
-#define CPUID_TOPOLOGY_LEVEL_CORE (2U << 8)
-#define CPUID_TOPOLOGY_LEVEL_DIE  (5U << 8)
+#define CPUID_B_ECX_TOPO_LEVEL_INVALID  0
+#define CPUID_B_ECX_TOPO_LEVEL_SMT  1
+#define CPUID_B_ECX_TOPO_LEVEL_CORE 2
+
+/* COUID[0x1F].ECX level types */
+#define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
+#define CPUID_1F_ECX_TOPO_LEVEL_SMT  CPUID_B_ECX_TOPO_LEVEL_SMT
+#define CPUID_1F_ECX_TOPO_LEVEL_CORE CPUID_B_ECX_TOPO_LEVEL_CORE
+#define CPUID_1F_ECX_TOPO_LEVEL_DIE  5
 
 /* MSR Feature Bits */
 #define MSR_ARCH_CAP_RDCL_NO(1U << 0)
-- 
2.34.1

[PATCH v4 12/21] i386: Expose module level in CPUID[0x1F]

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
erroneous smp_num_siblings on Intel Hybrid platforms") is able to
handle platforms with Module level enumerated via CPUID.1F.

Expose the module level in CPUID[0x1F] if the machine has more than 1
modules.

(Tested CPU topology in CPUID[0x1F] leaf with various die/cluster
configurations in "-smp".)

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Changes since v3:
 * New patch to expose module level in 0x1F.
 * Add Tested-by tag from Yongwei.
---
 target/i386/cpu.c | 12 +++-
 target/i386/cpu.h |  2 ++
 target/i386/kvm/kvm.c |  2 +-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index cef9a4606d89..f0ddb253b6b5 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -276,6 +276,8 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo 
*topo_info,
 return 1;
 case CPU_TOPO_LEVEL_CORE:
 return topo_info->threads_per_core;
+case CPU_TOPO_LEVEL_MODULE:
+return topo_info->threads_per_core * topo_info->cores_per_module;
 case CPU_TOPO_LEVEL_DIE:
 return topo_info->threads_per_core * topo_info->cores_per_module *
topo_info->modules_per_die;
@@ -296,6 +298,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo 
*topo_info,
 return 0;
 case CPU_TOPO_LEVEL_CORE:
 return apicid_core_offset(topo_info);
+case CPU_TOPO_LEVEL_MODULE:
+return apicid_module_offset(topo_info);
 case CPU_TOPO_LEVEL_DIE:
 return apicid_die_offset(topo_info);
 case CPU_TOPO_LEVEL_PACKAGE:
@@ -315,6 +319,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel 
topo_level)
 return CPUID_1F_ECX_TOPO_LEVEL_SMT;
 case CPU_TOPO_LEVEL_CORE:
 return CPUID_1F_ECX_TOPO_LEVEL_CORE;
+case CPU_TOPO_LEVEL_MODULE:
+return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
 case CPU_TOPO_LEVEL_DIE:
 return CPUID_1F_ECX_TOPO_LEVEL_DIE;
 default:
@@ -346,6 +352,10 @@ static void encode_topo_cpuid1f(CPUX86State *env, uint32_t 
count,
 if (env->nr_dies > 1) {
 set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
 }
+
+if (env->nr_modules > 1) {
+set_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap);
+}
 }
 
 *ecx = count & 0xff;
@@ -6390,7 +6400,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x1F:
 /* V2 Extended Topology Enumeration Leaf */
-if (topo_info.dies_per_pkg < 2) {
+if (topo_info.modules_per_die < 2 && topo_info.dies_per_pkg < 2) {
 *eax = *ebx = *ecx = *edx = 0;
 break;
 }
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 556e80f29764..54019e82fdb4 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1018,6 +1018,7 @@ enum CPUTopoLevel {
 CPU_TOPO_LEVEL_INVALID,
 CPU_TOPO_LEVEL_SMT,
 CPU_TOPO_LEVEL_CORE,
+CPU_TOPO_LEVEL_MODULE,
 CPU_TOPO_LEVEL_DIE,
 CPU_TOPO_LEVEL_PACKAGE,
 CPU_TOPO_LEVEL_MAX,
@@ -1032,6 +1033,7 @@ enum CPUTopoLevel {
 #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
 #define CPUID_1F_ECX_TOPO_LEVEL_SMT  CPUID_B_ECX_TOPO_LEVEL_SMT
 #define CPUID_1F_ECX_TOPO_LEVEL_CORE CPUID_B_ECX_TOPO_LEVEL_CORE
+#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
 #define CPUID_1F_ECX_TOPO_LEVEL_DIE  5
 
 /* MSR Feature Bits */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e5cd7cc80616..545b2d46221e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1959,7 +1959,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 break;
 }
 case 0x1f:
-if (env->nr_dies < 2) {
+if (env->nr_modules < 2 && env->nr_dies < 2) {
 break;
 }
 /* fallthrough */
-- 
2.34.1

[PATCH v4 09/21] i386: Decouple CPUID[0x1F] subleaf with specific topology level

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level.

In fact, the specific topology level exposed in 0x1F depends on the
platform's support for extension levels (module, tile and die).

To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf
with specific topology level.

Signed-off-by: Zhao Liu 
---
Changes since v3:
 * New patch to prepare to expose module level in 0x1F.
 * Move the CPUTopoLevel enumeration definition from "i386: Add cache
   topology info in CPUCacheInfo" to this patch. Note, to align with
   topology types in SDM, revert the name of CPU_TOPO_LEVEL_UNKNOW to
   CPU_TOPO_LEVEL_INVALID.
---
 target/i386/cpu.c | 136 +-
 target/i386/cpu.h |  15 +
 2 files changed, 126 insertions(+), 25 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 88ccc9df9118..10e11c85f459 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -268,6 +268,116 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
(cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
 }
 
+static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
+   enum CPUTopoLevel topo_level)
+{
+switch (topo_level) {
+case CPU_TOPO_LEVEL_SMT:
+return 1;
+case CPU_TOPO_LEVEL_CORE:
+return topo_info->threads_per_core;
+case CPU_TOPO_LEVEL_DIE:
+return topo_info->threads_per_core * topo_info->cores_per_die;
+case CPU_TOPO_LEVEL_PACKAGE:
+return topo_info->threads_per_core * topo_info->cores_per_die *
+   topo_info->dies_per_pkg;
+default:
+g_assert_not_reached();
+}
+return 0;
+}
+
+static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
+enum CPUTopoLevel topo_level)
+{
+switch (topo_level) {
+case CPU_TOPO_LEVEL_SMT:
+return 0;
+case CPU_TOPO_LEVEL_CORE:
+return apicid_core_offset(topo_info);
+case CPU_TOPO_LEVEL_DIE:
+return apicid_die_offset(topo_info);
+case CPU_TOPO_LEVEL_PACKAGE:
+return apicid_pkg_offset(topo_info);
+default:
+g_assert_not_reached();
+}
+return 0;
+}
+
+static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
+{
+switch (topo_level) {
+case CPU_TOPO_LEVEL_INVALID:
+return CPUID_1F_ECX_TOPO_LEVEL_INVALID;
+case CPU_TOPO_LEVEL_SMT:
+return CPUID_1F_ECX_TOPO_LEVEL_SMT;
+case CPU_TOPO_LEVEL_CORE:
+return CPUID_1F_ECX_TOPO_LEVEL_CORE;
+case CPU_TOPO_LEVEL_DIE:
+return CPUID_1F_ECX_TOPO_LEVEL_DIE;
+default:
+/* Other types are not supported in QEMU. */
+g_assert_not_reached();
+}
+return 0;
+}
+
+static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
+X86CPUTopoInfo *topo_info,
+uint32_t *eax, uint32_t *ebx,
+uint32_t *ecx, uint32_t *edx)
+{
+static DECLARE_BITMAP(topo_bitmap, CPU_TOPO_LEVEL_MAX);
+X86CPU *cpu = env_archcpu(env);
+unsigned long level, next_level;
+uint32_t num_cpus_next_level, offset_next_level;
+
+/*
+ * Initialize the bitmap to decide which levels should be
+ * encoded in 0x1f.
+ */
+if (!count) {
+/* SMT and core levels are exposed in 0x1f leaf by default. */
+set_bit(CPU_TOPO_LEVEL_SMT, topo_bitmap);
+set_bit(CPU_TOPO_LEVEL_CORE, topo_bitmap);
+
+if (env->nr_dies > 1) {
+set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
+}
+}
+
+*ecx = count & 0xff;
+*edx = cpu->apic_id;
+
+level = find_first_bit(topo_bitmap, CPU_TOPO_LEVEL_MAX);
+if (level == CPU_TOPO_LEVEL_MAX) {
+num_cpus_next_level = 0;
+offset_next_level = 0;
+
+/* Encode CPU_TOPO_LEVEL_INVALID into the last subleaf of 0x1f. */
+level = CPU_TOPO_LEVEL_INVALID;
+} else {
+next_level = find_next_bit(topo_bitmap, CPU_TOPO_LEVEL_MAX, level + 1);
+if (next_level == CPU_TOPO_LEVEL_MAX) {
+next_level = CPU_TOPO_LEVEL_PACKAGE;
+}
+
+num_cpus_next_level = num_cpus_by_topo_level(topo_info, next_level);
+offset_next_level = apicid_offset_by_topo_level(topo_info, next_level);
+}
+
+*eax = offset_next_level;
+*ebx = num_cpus_next_level;
+*ecx |= cpuid1f_topo_type(level) << 8;
+
+assert(!(*eax & ~0x1f));
+*ebx &= 0x; /* The count doesn't need to be reliable. */
+if (level != CPU_TOPO_LEVEL_MAX) {
+clear_bit(level, topo_bitmap);
+}
+}
+
 /* Encode cache info for CPUID[0x8005].ECX or CPUID[0x8005].EDX */
 static uint32_t encode_cache_cpuid8005(CPUCacheInfo *cache)
 {
@@ -6280,31 +6390,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 }
 
-*ecx = count & 0xff;
-*edx = cpu->apic_id;
-s

[PATCH v4 00/21] Support smp.clusters for x86 in QEMU

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

Hi list,

(CC k...@vger.kernel.org for better browsing.)

This is the our v4 patch series, rebased on the master branch at the
commit 9ef497755afc2 ("Merge tag 'pull-vfio-20230911' of
https://github.com/legoater/qemu into staging").

Comparing with v3 [1], v4 mainly refactors the CPUID[0x1F] encoding and
exposes module level in CPUID[0x1F] with these new patches:

* [PATCH v4 08/21] i386: Split topology types of CPUID[0x1F] from the
definitions of CPUID[0xB]
* [PATCH v4 09/21] i386: Decouple CPUID[0x1F] subleaf with specific
topology level
* [PATCH v4 12/21] i386: Expose module level in CPUID[0x1F]

v4 also fixes compile warnings and fixes cache topology uninitialization
bugs for some AMD CPUs.

Welcome your comments!


# Introduction

This series add the cluster support for x86 PC machine, which allows
x86 can use smp.clusters to configure the module level CPU topology
of x86.

And since the compatibility issue (see section: ## Why not share L2
cache in cluster directly), this series also introduce a new command
to adjust the topology of the x86 L2 cache.

Welcome your comments!


# Background

The "clusters" parameter in "smp" is introduced by ARM [2], but x86
hasn't supported it.

At present, x86 defaults L2 cache is shared in one core, but this is
not enough. There're some platforms that multiple cores share the
same L2 cache, e.g., Alder Lake-P shares L2 cache for one module of
Atom cores [3], that is, every four Atom cores shares one L2 cache.
Therefore, we need the new CPU topology level (cluster/module).

Another reason is for hybrid architecture. cluster support not only
provides another level of topology definition in x86, but would also
provide required code change for future our hybrid topology support.


# Overview

## Introduction of module level for x86

"cluster" in smp is the CPU topology level which is between "core" and
die.

For x86, the "cluster" in smp is corresponding to the module level [4],
which is above the core level. So use the "module" other than "cluster"
in x86 code.

And please note that x86 already has a cpu topology level also named
"cluster" [4], this level is at the upper level of the package. Here,
the cluster in x86 cpu topology is completely different from the
"clusters" as the smp parameter. After the module level is introduced,
the cluster as the smp parameter will actually refer to the module level
of x86.


## Why not share L2 cache in cluster directly

Though "clusters" was introduced to help define L2 cache topology
[2], using cluster to define x86's L2 cache topology will cause the
compatibility problem:

Currently, x86 defaults that the L2 cache is shared in one core, which
actually implies a default setting "cores per L2 cache is 1" and
therefore implicitly defaults to having as many L2 caches as cores.

For example (i386 PC machine):
-smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)

Considering the topology of the L2 cache, this (*) implicitly means "1
core per L2 cache" and "2 L2 caches per die".

If we use cluster to configure L2 cache topology with the new default
setting "clusters per L2 cache is 1", the above semantics will change
to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
cores per L2 cache".

So the same command (*) will cause changes in the L2 cache topology,
further affecting the performance of the virtual machine.

Therefore, x86 should only treat cluster as a cpu topology level and
avoid using it to change L2 cache by default for compatibility.


## module level in CPUID

Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
erroneous smp_num_siblings on Intel Hybrid platforms") is able to
handle platforms with Module level enumerated via CPUID.1F.

Expose the module level in CPUID[0x1F] (for Intel CPUs) if the machine
has more than 1 modules since v3.

We can configure CPUID.04H.02H (L2 cache topology) with module level by
a new command:

"-cpu,x-l2-cache-topo=cluster"

More information about this command, please see the section: "## New
property: x-l2-cache-topo".


## New cache topology info in CPUCacheInfo

Currently, by default, the cache topology is encoded as:
1. i/d cache is shared in one core.
2. L2 cache is shared in one core.
3. L3 cache is shared in one die.

This default general setting has caused a misunderstanding, that is, the
cache topology is completely equated with a specific cpu topology, such
as the connection between L2 cache and core level, and the connection
between L3 cache and die level.

In fact, the settings of these topologies depend on the specific
platform and are not static. For example, on Alder Lake-P, every
four Atom cores share the same L2 cache [2].

Thus, in this patch set, we explicitly define the corresponding cache
topology for different cpu models and this has two benefits:
1. Easy to expand to new CPU models in the future, which has different
   cache topology.
2. It can easily support custom cache topology by some command (e.g.,

[PATCH v4 20/21] i386: Use CPUCacheInfo.share_level to encode CPUID[0x8000001D].EAX[bits 25:14]

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

CPUID[0x801D].EAX[bits 25:14] NumSharingCache: number of logical
processors sharing cache.

The number of logical processors sharing this cache is
NumSharingCache + 1.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[0x801D].EAX[bits 25:14].

Signed-off-by: Zhao Liu 
---
Changes since v3:
 * Explain what "CPUID[0x801D].EAX[bits 25:14]" means in the commit
   message. (Babu)

Changes since v1:
 * Use cache->share_level as the parameter in
   max_processor_ids_for_cache().
---
 target/i386/cpu.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bc28c59df089..3bed823dc3b7 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -482,20 +482,12 @@ static void encode_cache_cpuid801d(CPUCacheInfo 
*cache,
uint32_t *eax, uint32_t *ebx,
uint32_t *ecx, uint32_t *edx)
 {
-uint32_t num_sharing_cache;
 assert(cache->size == cache->line_size * cache->associativity *
   cache->partitions * cache->sets);
 
 *eax = CACHE_TYPE(cache->type) | CACHE_LEVEL(cache->level) |
(cache->self_init ? CACHE_SELF_INIT_LEVEL : 0);
-
-/* L3 is shared among multiple cores */
-if (cache->level == 3) {
-num_sharing_cache = 1 << apicid_die_offset(topo_info);
-} else {
-num_sharing_cache = 1 << apicid_core_offset(topo_info);
-}
-*eax |= (num_sharing_cache - 1) << 14;
+*eax |= max_processor_ids_for_cache(topo_info, cache->share_level) << 14;
 
 assert(cache->line_size > 0);
 assert(cache->partitions > 0);
-- 
2.34.1

[PATCH v4 17/21] i386: Add cache topology info in CPUCacheInfo

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

Currently, by default, the cache topology is encoded as:
1. i/d cache is shared in one core.
2. L2 cache is shared in one core.
3. L3 cache is shared in one die.

This default general setting has caused a misunderstanding, that is, the
cache topology is completely equated with a specific cpu topology, such
as the connection between L2 cache and core level, and the connection
between L3 cache and die level.

In fact, the settings of these topologies depend on the specific
platform and are not static. For example, on Alder Lake-P, every
four Atom cores share the same L2 cache.

Thus, we should explicitly define the corresponding cache topology for
different cache models to increase scalability.

Except legacy_l2_cache_cpuid2 (its default topo level is
CPU_TOPO_LEVEL_UNKNOW), explicitly set the corresponding topology level
for all other cache models. In order to be compatible with the existing
cache topology, set the CPU_TOPO_LEVEL_CORE level for the i/d cache, set
the CPU_TOPO_LEVEL_CORE level for L2 cache, and set the
CPU_TOPO_LEVEL_DIE level for L3 cache.

The field for CPUID[4].EAX[bits 25:14] or CPUID[0x801D].EAX[bits
25:14] will be set based on CPUCacheInfo.share_level.

Signed-off-by: Zhao Liu 
---
Changes since v3:
 * Fix cache topology uninitialization bugs for some AMD CPUs. (Babu)
 * Move the CPUTopoLevel enumeration definition to the previous 0x1f
   rework patch.

Changes since v1:
 * Add the prefix "CPU_TOPO_LEVEL_*" for CPU topology level names.
   (Yanan)
 * (Revert, pls refer "i386: Decouple CPUID[0x1F] subleaf with specific
   topology level") Rename the "INVALID" level to CPU_TOPO_LEVEL_UNKNOW.
   (Yanan)
---
 target/i386/cpu.c | 36 
 target/i386/cpu.h |  7 +++
 2 files changed, 43 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d8c5e774cf95..fcb4f4da3431 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -557,6 +557,7 @@ static CPUCacheInfo legacy_l1d_cache = {
 .sets = 64,
 .partitions = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 0x8005 is inconsistent with leaves 2 & 4 */
@@ -571,6 +572,7 @@ static CPUCacheInfo legacy_l1d_cache_amd = {
 .partitions = 1,
 .lines_per_tag = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* L1 instruction cache: */
@@ -584,6 +586,7 @@ static CPUCacheInfo legacy_l1i_cache = {
 .sets = 64,
 .partitions = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 0x8005 is inconsistent with leaves 2 & 4 */
@@ -598,6 +601,7 @@ static CPUCacheInfo legacy_l1i_cache_amd = {
 .partitions = 1,
 .lines_per_tag = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* Level 2 unified cache: */
@@ -611,6 +615,7 @@ static CPUCacheInfo legacy_l2_cache = {
 .sets = 4096,
 .partitions = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 2 descriptor is inconsistent with CPUID leaf 4 */
@@ -620,6 +625,7 @@ static CPUCacheInfo legacy_l2_cache_cpuid2 = {
 .size = 2 * MiB,
 .line_size = 64,
 .associativity = 8,
+.share_level = CPU_TOPO_LEVEL_INVALID,
 };
 
 
@@ -633,6 +639,7 @@ static CPUCacheInfo legacy_l2_cache_amd = {
 .associativity = 16,
 .sets = 512,
 .partitions = 1,
+.share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* Level 3 unified cache: */
@@ -648,6 +655,7 @@ static CPUCacheInfo legacy_l3_cache = {
 .self_init = true,
 .inclusive = true,
 .complex_indexing = true,
+.share_level = CPU_TOPO_LEVEL_DIE,
 };
 
 /* TLB definitions: */
@@ -1944,6 +1952,7 @@ static const CPUCaches epyc_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l1i_cache = &(CPUCacheInfo) {
 .type = INSTRUCTION_CACHE,
@@ -1956,6 +1965,7 @@ static const CPUCaches epyc_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l2_cache = &(CPUCacheInfo) {
 .type = UNIFIED_CACHE,
@@ -1966,6 +1976,7 @@ static const CPUCaches epyc_cache_info = {
 .partitions = 1,
 .sets = 1024,
 .lines_per_tag = 1,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l3_cache = &(CPUCacheInfo) {
 .type = UNIFIED_CACHE,
@@ -1979,6 +1990,7 @@ static const CPUCaches epyc_cache_info = {
 .self_init = true,
 .inclusive = true,
 .complex_indexing = true,
+.share_level = CPU_TOPO_LEVEL_DIE,
 },
 };
 
@@ -1994,6 +2006,7 @@ static CPUCaches epyc_v4_cache_info = {
 .lines_per_tag = 1,
 .self_init = 1,
 .no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
 },
 .l1i_cache = &(CPUCacheInfo) {

[PATCH v4 19/21] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information
for cpuid 0x801D") adds the cache topology for AMD CPU by encoding
the number of sharing threads directly.

>From AMD's APM, NumSharingCache (CPUID[0x801D].EAX[bits 25:14])
means [1]:

The number of logical processors sharing this cache is the value of
this field incremented by 1. To determine which logical processors are
sharing a cache, determine a Share Id for each processor as follows:

ShareId = LocalApicId >> log2(NumSharingCache+1)

Logical processors with the same ShareId then share a cache. If
NumSharingCache+1 is not a power of two, round it up to the next power
of two.

>From the description above, the calculation of this field should be same
as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of
APIC ID to calculate this field.

[1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
 Information

Signed-off-by: Zhao Liu 
---
Changes since v3:
 * Rewrite the subject. (Babu)
 * Delete the original "comment/help" expression, as this behavior is
   confirmed for AMD CPUs. (Babu)
 * Rename "num_apic_ids" (v3) to "num_sharing_cache" to match spec
   definition. (Babu)

Changes since v1:
 * Rename "l3_threads" to "num_apic_ids" in
   encode_cache_cpuid801d(). (Yanan)
 * Add the description of the original commit and add Cc.
---
 target/i386/cpu.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5d066107d6ce..bc28c59df089 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -482,7 +482,7 @@ static void encode_cache_cpuid801d(CPUCacheInfo *cache,
uint32_t *eax, uint32_t *ebx,
uint32_t *ecx, uint32_t *edx)
 {
-uint32_t l3_threads;
+uint32_t num_sharing_cache;
 assert(cache->size == cache->line_size * cache->associativity *
   cache->partitions * cache->sets);
 
@@ -491,13 +491,11 @@ static void encode_cache_cpuid801d(CPUCacheInfo 
*cache,
 
 /* L3 is shared among multiple cores */
 if (cache->level == 3) {
-l3_threads = topo_info->modules_per_die *
- topo_info->cores_per_module *
- topo_info->threads_per_core;
-*eax |= (l3_threads - 1) << 14;
+num_sharing_cache = 1 << apicid_die_offset(topo_info);
 } else {
-*eax |= ((topo_info->threads_per_core - 1) << 14);
+num_sharing_cache = 1 << apicid_core_offset(topo_info);
 }
+*eax |= (num_sharing_cache - 1) << 14;
 
 assert(cache->line_size > 0);
 assert(cache->partitions > 0);
-- 
2.34.1

[PATCH v4 13/21] i386: Support module_id in X86CPUTopoIDs

2023-09-14 Thread Zhao Liu

From: Zhuocheng Ding 

Add module_id member in X86CPUTopoIDs.

module_id can be parsed from APIC ID, so also update APIC ID parsing
rule to support module level. With this support, the conversions with
module level between X86CPUTopoIDs, X86CPUTopoInfo and APIC ID are
completed.

module_id can be also generated from cpu topology, and before i386
supports "clusters" in smp, the default "clusters per die" is only 1,
thus the module_id generated in this way is 0, so that it will not
conflict with the module_id generated by APIC ID.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Acked-by: Michael S. Tsirkin 
---
Changes since v1:
 * Merge the patch "i386: Update APIC ID parsing rule to support module
   level" into this one. (Yanan)
 * Move the apicid_module_width() and apicid_module_offset() support
   into the previous modules_per_die related patch. (Yanan)
---
 hw/i386/x86.c  | 28 +---
 include/hw/i386/topology.h | 17 +
 2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 267bb0f96ca5..5b05dbdedbff 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -311,11 +311,11 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 /*
  * If APIC ID is not set,
- * set it based on socket/die/core/thread properties.
+ * set it based on socket/die/cluster/core/thread properties.
  */
 if (cpu->apic_id == UNASSIGNED_APIC_ID) {
-int max_socket = (ms->smp.max_cpus - 1) /
-smp_threads / smp_cores / ms->smp.dies;
+int max_socket = (ms->smp.max_cpus - 1) / smp_threads / smp_cores /
+ms->smp.clusters / ms->smp.dies;
 
 /*
  * die-id was optional in QEMU 4.0 and older, so keep it optional
@@ -362,6 +362,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 topo_ids.die_id = cpu->die_id;
 topo_ids.core_id = cpu->core_id;
 topo_ids.smt_id = cpu->thread_id;
+
+/*
+ * TODO: This is the temporary initialization for topo_ids.module_id to
+ * avoid "maybe-uninitialized" compilation errors. Will remove when
+ * X86CPU supports cluster_id.
+ */
+topo_ids.module_id = 0;
+
 cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
 }
 
@@ -370,11 +378,13 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 MachineState *ms = MACHINE(x86ms);
 
 x86_topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
+
 error_setg(errp,
-"Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with"
-" APIC ID %" PRIu32 ", valid index range 0:%d",
-topo_ids.pkg_id, topo_ids.die_id, topo_ids.core_id, 
topo_ids.smt_id,
-cpu->apic_id, ms->possible_cpus->len - 1);
+"Invalid CPU [socket: %u, die: %u, module: %u, core: %u, thread: 
%u]"
+" with APIC ID %" PRIu32 ", valid index range 0:%d",
+topo_ids.pkg_id, topo_ids.die_id, topo_ids.module_id,
+topo_ids.core_id, topo_ids.smt_id, cpu->apic_id,
+ms->possible_cpus->len - 1);
 return;
 }
 
@@ -495,6 +505,10 @@ const CPUArchIdList 
*x86_possible_cpu_arch_ids(MachineState *ms)
 ms->possible_cpus->cpus[i].props.has_die_id = true;
 ms->possible_cpus->cpus[i].props.die_id = topo_ids.die_id;
 }
+if (ms->smp.clusters > 1) {
+ms->possible_cpus->cpus[i].props.has_cluster_id = true;
+ms->possible_cpus->cpus[i].props.cluster_id = topo_ids.module_id;
+}
 ms->possible_cpus->cpus[i].props.has_core_id = true;
 ms->possible_cpus->cpus[i].props.core_id = topo_ids.core_id;
 ms->possible_cpus->cpus[i].props.has_thread_id = true;
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index c807d3811dd3..3cec97b377f2 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -50,6 +50,7 @@ typedef uint32_t apic_id_t;
 typedef struct X86CPUTopoIDs {
 unsigned pkg_id;
 unsigned die_id;
+unsigned module_id;
 unsigned core_id;
 unsigned smt_id;
 } X86CPUTopoIDs;
@@ -127,6 +128,7 @@ static inline apic_id_t 
x86_apicid_from_topo_ids(X86CPUTopoInfo *topo_info,
 {
 return (topo_ids->pkg_id  << apicid_pkg_offset(topo_info)) |
(topo_ids->die_id  << apicid_die_offset(topo_info)) |
+   (topo_ids->module_id << apicid_module_offset(topo_info)) |
(topo_ids->core_id << apicid_core_offset(topo_info)) |
topo_ids->smt_id;
 }
@@ -140,12 +142,16 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo 
*topo_info,
  X86CPUTopoIDs *topo_ids)
 {
 unsigned nr_dies = topo_info->dies_per_pkg;
-unsigned nr_cores = topo_info->cores_per_module *
-topo_info->modules_per_die;
+unsigned nr_modules = topo_inf

[PATCH v4 21/21] i386: Add new property to control L2 cache topo in CPUID.04H

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

The property x-l2-cache-topo will be used to change the L2 cache
topology in CPUID.04H.

Now it allows user to set the L2 cache is shared in core level or
cluster level.

If user passes "-cpu x-l2-cache-topo=[core|cluster]" then older L2 cache
topology will be overrode by the new topology setting.

Here we expose to user "cluster" instead of "module", to be consistent
with "cluster-id" naming.

Since CPUID.04H is used by intel CPUs, this property is available on
intel CPUs as for now.

When necessary, it can be extended to CPUID.801DH for AMD CPUs.

(Tested the cache topology in CPUID[0x04] leaf with "x-l2-cache-topo=[
core|cluster]", and tested the live migration between the QEMUs w/ &
w/o this patch series.)

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Changes since v3:
 * Add description about test for live migration compatibility. (Babu)

Changes since v1:
 * Rename MODULE branch to CPU_TOPO_LEVEL_MODULE to match the previous
   renaming changes.
---
 target/i386/cpu.c | 34 +-
 target/i386/cpu.h |  2 ++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3bed823dc3b7..b1282c8bd3b7 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -243,6 +243,9 @@ static uint32_t max_processor_ids_for_cache(X86CPUTopoInfo 
*topo_info,
 case CPU_TOPO_LEVEL_CORE:
 num_ids = 1 << apicid_core_offset(topo_info);
 break;
+case CPU_TOPO_LEVEL_MODULE:
+num_ids = 1 << apicid_module_offset(topo_info);
+break;
 case CPU_TOPO_LEVEL_DIE:
 num_ids = 1 << apicid_die_offset(topo_info);
 break;
@@ -251,7 +254,7 @@ static uint32_t max_processor_ids_for_cache(X86CPUTopoInfo 
*topo_info,
 break;
 default:
 /*
- * Currently there is no use case for SMT and MODULE, so use
+ * Currently there is no use case for SMT, so use
  * assert directly to facilitate debugging.
  */
 g_assert_not_reached();
@@ -7576,6 +7579,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 env->cache_info_amd.l3_cache = &legacy_l3_cache;
 }
 
+if (cpu->l2_cache_topo_level) {
+/*
+ * FIXME: Currently only supports changing CPUID[4] (for intel), and
+ * will support changing CPUID[0x801D] when necessary.
+ */
+if (!IS_INTEL_CPU(env)) {
+error_setg(errp, "only intel cpus supports x-l2-cache-topo");
+return;
+}
+
+if (!strcmp(cpu->l2_cache_topo_level, "core")) {
+env->cache_info_cpuid4.l2_cache->share_level = CPU_TOPO_LEVEL_CORE;
+} else if (!strcmp(cpu->l2_cache_topo_level, "cluster")) {
+/*
+ * We expose to users "cluster" instead of "module", to be
+ * consistent with "cluster-id" naming.
+ */
+env->cache_info_cpuid4.l2_cache->share_level =
+CPU_TOPO_LEVEL_MODULE;
+} else {
+error_setg(errp,
+   "x-l2-cache-topo doesn't support '%s', "
+   "and it only supports 'core' or 'cluster'",
+   cpu->l2_cache_topo_level);
+return;
+}
+}
+
 #ifndef CONFIG_USER_ONLY
 MachineState *ms = MACHINE(qdev_get_machine());
 qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
@@ -8079,6 +8110,7 @@ static Property x86_cpu_properties[] = {
  false),
 DEFINE_PROP_BOOL("x-intel-pt-auto-level", X86CPU, intel_pt_auto_level,
  true),
+DEFINE_PROP_STRING("x-l2-cache-topo", X86CPU, l2_cache_topo_level),
 DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index a13132007415..05ffc4c1cc6e 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2073,6 +2073,8 @@ struct ArchCPU {
 int32_t hv_max_vps;
 
 bool xen_vapic;
+
+char *l2_cache_topo_level;
 };
 
 
-- 
2.34.1

[PATCH v4 16/21] hw/i386/pc: Support smp.clusters for x86 PC machine

2023-09-14 Thread Zhao Liu

From: Zhuocheng Ding 

As module-level topology support is added to X86CPU, now we can enable
the support for the cluster parameter on PC machines. With this support,
we can define a 5-level x86 CPU topology with "-smp":

-smp cpus=*,maxcpus=*,sockets=*,dies=*,clusters=*,cores=*,threads=*.

Additionally, add the 5-level topology example in description of "-smp".

Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
Reviewed-by: Yanan Wang 
Acked-by: Michael S. Tsirkin 
---
 hw/i386/pc.c|  1 +
 qemu-options.hx | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 54838c0c411d..b2b8ec4968c9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1888,6 +1888,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
 mc->smp_props.dies_supported = true;
+mc->smp_props.clusters_supported = true;
 mc->default_ram_id = "pc.ram";
 pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_64;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 6be621c23249..ff4a67d6449d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -334,14 +334,14 @@ SRST
 -smp 8,sockets=2,cores=2,threads=2,maxcpus=8
 
 The following sub-option defines a CPU topology hierarchy (2 sockets
-totally on the machine, 2 dies per socket, 2 cores per die, 2 threads
-per core) for PC machines which support sockets/dies/cores/threads.
-Some members of the option can be omitted but their values will be
-automatically computed:
+totally on the machine, 2 dies per socket, 2 clusters per die, 2 cores per
+cluster, 2 threads per core) for PC machines which support sockets/dies
+/clusters/cores/threads. Some members of the option can be omitted but
+their values will be automatically computed:
 
 ::
 
--smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16
+-smp 32,sockets=2,dies=2,clusters=2,cores=2,threads=2,maxcpus=32
 
 The following sub-option defines a CPU topology hierarchy (2 sockets
 totally on the machine, 2 clusters per socket, 2 cores per cluster,
-- 
2.34.1

[PATCH v4 11/21] i386: Support modules_per_die in X86CPUTopoInfo

2023-09-14 Thread Zhao Liu

From: Zhuocheng Ding 

Support module level in i386 cpu topology structure "X86CPUTopoInfo".

Since x86 does not yet support the "clusters" parameter in "-smp",
X86CPUTopoInfo.modules_per_die is currently always 1. Therefore, the
module level width in APIC ID, which can be calculated by
"apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0
for now, so we can directly add APIC ID related helpers to support
module level parsing.

In addition, update topology structure in test-x86-topo.c.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
---
Changes since v3:
 * Drop the description about not exposing module level in commit
   message.
 * Update topology related calculation in newly added helpers:
   num_cpus_by_topo_level() and apicid_offset_by_topo_level().
 * Since the code change, drop the "Acked-by" tag.

Changes since v1:
 * Include module level related helpers (apicid_module_width() and
   apicid_module_offset()) in this patch. (Yanan)
---
 hw/i386/x86.c  |  3 ++-
 include/hw/i386/topology.h | 22 +++
 target/i386/cpu.c  | 17 +-
 tests/unit/test-x86-topo.c | 45 --
 4 files changed, 55 insertions(+), 32 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 9c61b6882b99..267bb0f96ca5 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -72,7 +72,8 @@ static void init_topo_info(X86CPUTopoInfo *topo_info,
 MachineState *ms = MACHINE(x86ms);
 
 topo_info->dies_per_pkg = ms->smp.dies;
-topo_info->cores_per_die = ms->smp.cores;
+topo_info->modules_per_die = ms->smp.clusters;
+topo_info->cores_per_module = ms->smp.cores;
 topo_info->threads_per_core = ms->smp.threads;
 }
 
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index 5a19679f618b..c807d3811dd3 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -56,7 +56,8 @@ typedef struct X86CPUTopoIDs {
 
 typedef struct X86CPUTopoInfo {
 unsigned dies_per_pkg;
-unsigned cores_per_die;
+unsigned modules_per_die;
+unsigned cores_per_module;
 unsigned threads_per_core;
 } X86CPUTopoInfo;
 
@@ -77,7 +78,13 @@ static inline unsigned apicid_smt_width(X86CPUTopoInfo 
*topo_info)
 /* Bit width of the Core_ID field */
 static inline unsigned apicid_core_width(X86CPUTopoInfo *topo_info)
 {
-return apicid_bitwidth_for_count(topo_info->cores_per_die);
+return apicid_bitwidth_for_count(topo_info->cores_per_module);
+}
+
+/* Bit width of the Module_ID (cluster ID) field */
+static inline unsigned apicid_module_width(X86CPUTopoInfo *topo_info)
+{
+return apicid_bitwidth_for_count(topo_info->modules_per_die);
 }
 
 /* Bit width of the Die_ID field */
@@ -92,10 +99,16 @@ static inline unsigned apicid_core_offset(X86CPUTopoInfo 
*topo_info)
 return apicid_smt_width(topo_info);
 }
 
+/* Bit offset of the Module_ID (cluster ID) field */
+static inline unsigned apicid_module_offset(X86CPUTopoInfo *topo_info)
+{
+return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+}
+
 /* Bit offset of the Die_ID field */
 static inline unsigned apicid_die_offset(X86CPUTopoInfo *topo_info)
 {
-return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+return apicid_module_offset(topo_info) + apicid_module_width(topo_info);
 }
 
 /* Bit offset of the Pkg_ID (socket ID) field */
@@ -127,7 +140,8 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo 
*topo_info,
  X86CPUTopoIDs *topo_ids)
 {
 unsigned nr_dies = topo_info->dies_per_pkg;
-unsigned nr_cores = topo_info->cores_per_die;
+unsigned nr_cores = topo_info->cores_per_module *
+topo_info->modules_per_die;
 unsigned nr_threads = topo_info->threads_per_core;
 
 topo_ids->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 401409c5db08..cef9a4606d89 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -277,10 +277,11 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo 
*topo_info,
 case CPU_TOPO_LEVEL_CORE:
 return topo_info->threads_per_core;
 case CPU_TOPO_LEVEL_DIE:
-return topo_info->threads_per_core * topo_info->cores_per_die;
+return topo_info->threads_per_core * topo_info->cores_per_module *
+   topo_info->modules_per_die;
 case CPU_TOPO_LEVEL_PACKAGE:
-return topo_info->threads_per_core * topo_info->cores_per_die *
-   topo_info->dies_per_pkg;
+return topo_info->threads_per_core * topo_info->cores_per_module *
+   topo_info->modules_per_die * topo_info->dies_per_pkg;
 default:
 g_assert_not_reached();
 }
@@ -449,7 +450,9 @@ static void encode_cache_cpuid801d(CPUCacheInfo *cache,
 
 /* L3 is shared among multiple cores */
 if (cache->level == 3) {
-l3_threads = topo_info->cores_per_die * t

[PATCH v4 10/21] i386: Introduce module-level cpu topology to CPUX86State

2023-09-14 Thread Zhao Liu

From: Zhuocheng Ding 

smp command has the "clusters" parameter but x86 hasn't supported that
level. "cluster" is a CPU topology level concept above cores, in which
the cores may share some resources (L2 cache or some others like L3
cache tags, depending on the Archs) [1][2]. For x86, the resource shared
by cores at the cluster level is mainly the L2 cache.

However, using cluster to define x86's L2 cache topology will cause the
compatibility problem:

Currently, x86 defaults that the L2 cache is shared in one core, which
actually implies a default setting "cores per L2 cache is 1" and
therefore implicitly defaults to having as many L2 caches as cores.

For example (i386 PC machine):
-smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)

Considering the topology of the L2 cache, this (*) implicitly means "1
core per L2 cache" and "2 L2 caches per die".

If we use cluster to configure L2 cache topology with the new default
setting "clusters per L2 cache is 1", the above semantics will change
to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
cores per L2 cache".

So the same command (*) will cause changes in the L2 cache topology,
further affecting the performance of the virtual machine.

Therefore, x86 should only treat cluster as a cpu topology level and
avoid using it to change L2 cache by default for compatibility.

"cluster" in smp is the CPU topology level which is between "core" and
die.

For x86, the "cluster" in smp is corresponding to the module level [2],
which is above the core level. So use the "module" other than "cluster"
in i386 code.

And please note that x86 already has a cpu topology level also named
"cluster" [3], this level is at the upper level of the package. Here,
the cluster in x86 cpu topology is completely different from the
"clusters" as the smp parameter. After the module level is introduced,
the cluster as the smp parameter will actually refer to the module level
of x86.

[1]: 864c3b5c32f0 ("hw/core/machine: Introduce CPU cluster topology support")
[2]: Yanan's comment about "cluster",
 https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html
[3]: SDM, vol.3, ch.9, 9.9.1 Hierarchical Mapping of Shared Resources.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Acked-by: Michael S. Tsirkin 
---
Changes since v1:
 * The background of the introduction of the "cluster" parameter and its
   exact meaning were revised according to Yanan's explanation. (Yanan)
---
 hw/i386/x86.c | 1 +
 target/i386/cpu.c | 1 +
 target/i386/cpu.h | 5 +
 3 files changed, 7 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index f034df8bf628..9c61b6882b99 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -306,6 +306,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 init_topo_info(&topo_info, x86ms);
 
 env->nr_dies = ms->smp.dies;
+env->nr_modules = ms->smp.clusters;
 
 /*
  * If APIC ID is not set,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 10e11c85f459..401409c5db08 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7696,6 +7696,7 @@ static void x86_cpu_initfn(Object *obj)
 CPUX86State *env = &cpu->env;
 
 env->nr_dies = 1;
+env->nr_modules = 1;
 cpu_set_cpustate_pointers(cpu);
 
 object_property_add(obj, "feature-words", "X86CPUFeatureWordInfo",
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 470257b92240..556e80f29764 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1903,6 +1903,11 @@ typedef struct CPUArchState {
 
 /* Number of dies within this CPU package. */
 unsigned nr_dies;
+/*
+ * Number of modules within this CPU package.
+ * Module level in x86 cpu topology is corresponding to smp.clusters.
+ */
+unsigned nr_modules;
 } CPUX86State;
 
 struct kvm_msrs;
-- 
2.34.1

[PATCH v4 04/21] hw/cpu: Update the comments of nr_cores and nr_dies

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

In the nr_threads' comment, specify it represents the
number of threads in the "core" to avoid confusion.

Also add comment for nr_dies in CPUX86State.

Signed-off-by: Zhao Liu 
---
Changes since v3:
 * The new patch split out of CPUSTATE.nr_cores' fix. (Xiaoyao)
---
 include/hw/core/cpu.h | 2 +-
 target/i386/cpu.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 92a4234439a3..df908b23c692 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -277,7 +277,7 @@ struct qemu_work_item;
  *   See TranslationBlock::TCG CF_CLUSTER_MASK.
  * @tcg_cflags: Pre-computed cflags for this cpu.
  * @nr_cores: Number of cores within this CPU package.
- * @nr_threads: Number of threads within this CPU.
+ * @nr_threads: Number of threads within this CPU core.
  * @running: #true if CPU is currently running (lockless).
  * @has_waiter: #true if a CPU is currently waiting for the cpu_exec_end;
  * valid under cpu_list_lock.
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index fbb05eace57e..70eb3bc23eb8 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1881,6 +1881,7 @@ typedef struct CPUArchState {
 
 TPRAccess tpr_access_type;
 
+/* Number of dies within this CPU package. */
 unsigned nr_dies;
 } CPUX86State;
 
-- 
2.34.1

[PATCH v4 15/21] tests: Add test case of APIC ID for module level parsing

2023-09-14 Thread Zhao Liu

From: Zhuocheng Ding 

After i386 supports module level, it's time to add the test for module
level's parsing.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
Reviewed-by: Yanan Wang 
Acked-by: Michael S. Tsirkin 
---
 tests/unit/test-x86-topo.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test-x86-topo.c b/tests/unit/test-x86-topo.c
index f21b8a5d95c2..55b731ccae55 100644
--- a/tests/unit/test-x86-topo.c
+++ b/tests/unit/test-x86-topo.c
@@ -37,6 +37,7 @@ static void test_topo_bits(void)
 topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
 g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 0);
 g_assert_cmpuint(apicid_core_width(&topo_info), ==, 0);
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 0);
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
 
 topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
@@ -74,13 +75,22 @@ static void test_topo_bits(void)
 topo_info = (X86CPUTopoInfo) {1, 1, 33, 2};
 g_assert_cmpuint(apicid_core_width(&topo_info), ==, 6);
 
-topo_info = (X86CPUTopoInfo) {1, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {1, 6, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+topo_info = (X86CPUTopoInfo) {1, 7, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+topo_info = (X86CPUTopoInfo) {1, 8, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+topo_info = (X86CPUTopoInfo) {1, 9, 30, 2};
+g_assert_cmpuint(apicid_module_width(&topo_info), ==, 4);
+
+topo_info = (X86CPUTopoInfo) {1, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
-topo_info = (X86CPUTopoInfo) {2, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {2, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 1);
-topo_info = (X86CPUTopoInfo) {3, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {3, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
-topo_info = (X86CPUTopoInfo) {4, 1, 30, 2};
+topo_info = (X86CPUTopoInfo) {4, 6, 30, 2};
 g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
 
 /* build a weird topology and see if IDs are calculated correctly
@@ -91,6 +101,7 @@ static void test_topo_bits(void)
 topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
 g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
 g_assert_cmpuint(apicid_core_offset(&topo_info), ==, 2);
+g_assert_cmpuint(apicid_module_offset(&topo_info), ==, 5);
 g_assert_cmpuint(apicid_die_offset(&topo_info), ==, 5);
 g_assert_cmpuint(apicid_pkg_offset(&topo_info), ==, 5);
 
-- 
2.34.1

[PATCH v4 18/21] i386: Use CPUCacheInfo.share_level to encode CPUID[4]

2023-09-14 Thread Zhao Liu

From: Zhao Liu 

CPUID[4].EAX[bits 25:14] is used to represent the cache topology for
Intel CPUs.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[4].EAX[bits 25:14].

And since maximum_processor_id (original "num_apic_ids") is parsed
based on cpu topology levels, which are verified when parsing smp, it's
no need to check this value by "assert(num_apic_ids > 0)" again, so
remove this assert.

Additionally, wrap the encoding of CPUID[4].EAX[bits 31:26] into a
helper to make the code cleaner.

Signed-off-by: Zhao Liu 
---
Changes since v1:
 * Use "enum CPUTopoLevel share_level" as the parameter in
   max_processor_ids_for_cache().
 * Make cache_into_passthrough case also use
   max_processor_ids_for_cache() and max_core_ids_in_package() to
   encode CPUID[4]. (Yanan)
 * Rename the title of this patch (the original is "i386: Use
   CPUCacheInfo.share_level to encode CPUID[4].EAX[bits 25:14]").
---
 target/i386/cpu.c | 70 +--
 1 file changed, 43 insertions(+), 27 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index fcb4f4da3431..5d066107d6ce 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -234,22 +234,53 @@ static uint8_t cpuid2_cache_descriptor(CPUCacheInfo 
*cache)
((t) == UNIFIED_CACHE) ? CACHE_TYPE_UNIFIED : \
0 /* Invalid value */)
 
+static uint32_t max_processor_ids_for_cache(X86CPUTopoInfo *topo_info,
+enum CPUTopoLevel share_level)
+{
+uint32_t num_ids = 0;
+
+switch (share_level) {
+case CPU_TOPO_LEVEL_CORE:
+num_ids = 1 << apicid_core_offset(topo_info);
+break;
+case CPU_TOPO_LEVEL_DIE:
+num_ids = 1 << apicid_die_offset(topo_info);
+break;
+case CPU_TOPO_LEVEL_PACKAGE:
+num_ids = 1 << apicid_pkg_offset(topo_info);
+break;
+default:
+/*
+ * Currently there is no use case for SMT and MODULE, so use
+ * assert directly to facilitate debugging.
+ */
+g_assert_not_reached();
+}
+
+return num_ids - 1;
+}
+
+static uint32_t max_core_ids_in_package(X86CPUTopoInfo *topo_info)
+{
+uint32_t num_cores = 1 << (apicid_pkg_offset(topo_info) -
+   apicid_core_offset(topo_info));
+return num_cores - 1;
+}
 
 /* Encode cache info for CPUID[4] */
 static void encode_cache_cpuid4(CPUCacheInfo *cache,
-int num_apic_ids, int num_cores,
+X86CPUTopoInfo *topo_info,
 uint32_t *eax, uint32_t *ebx,
 uint32_t *ecx, uint32_t *edx)
 {
 assert(cache->size == cache->line_size * cache->associativity *
   cache->partitions * cache->sets);
 
-assert(num_apic_ids > 0);
 *eax = CACHE_TYPE(cache->type) |
CACHE_LEVEL(cache->level) |
(cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
-   ((num_cores - 1) << 26) |
-   ((num_apic_ids - 1) << 14);
+   (max_core_ids_in_package(topo_info) << 26) |
+   (max_processor_ids_for_cache(topo_info, cache->share_level) << 14);
 
 assert(cache->line_size > 0);
 assert(cache->partitions > 0);
@@ -6258,56 +6289,41 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 
 if (cores_per_pkg > 1) {
-int addressable_cores_offset =
-apicid_pkg_offset(&topo_info) -
-apicid_core_offset(&topo_info);
-
 *eax &= ~0xFC00;
-*eax |= (1 << (addressable_cores_offset - 1)) << 26;
+*eax |= max_core_ids_in_package(&topo_info) << 26;
 }
 if (host_vcpus_per_cache > cpus_per_pkg) {
-int pkg_offset = apicid_pkg_offset(&topo_info);
-
 *eax &= ~0x3FFC000;
-*eax |= (1 << (pkg_offset - 1)) << 14;
+*eax |=
+max_processor_ids_for_cache(&topo_info,
+CPU_TOPO_LEVEL_PACKAGE) << 14;
 }
 }
 } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
 *eax = *ebx = *ecx = *edx = 0;
 } else {
 *eax = 0;
-int addressable_cores_offset = apicid_pkg_offset(&topo_info) -
-   apicid_core_offset(&topo_info);
-int core_offset, die_offset;
 
 switch (count) {
 case 0: /* L1 dcache info */
-core_offset = apicid_core_offset(&topo_info);
 encode_cache_cpuid4(env->cache_info_cpuid4.l

Re: [QEMU PATCH v4 10/13] virtio-gpu: Resource UUID

2023-09-14 Thread Akihiko Odaki


On 2023/09/13 23:18, Albert Esteve wrote:



On Wed, Sep 13, 2023 at 3:43 PM Akihiko Odaki > wrote:


On 2023/09/13 21:58, Albert Esteve wrote:
 >
 >
 > On Wed, Sep 13, 2023 at 2:22 PM Akihiko Odaki
mailto:akihiko.od...@daynix.com>
 > >> wrote:
 >
 >     On 2023/09/13 20:34, Albert Esteve wrote:
 >      >
 >      >
 >      > On Wed, Sep 13, 2023 at 12:34 PM Akihiko Odaki
 >     mailto:akihiko.od...@daynix.com>
>
 >      > 
 >           >
 >      >     On 2023/09/13 16:55, Albert Esteve wrote:
 >      >      > Hi Antonio,
 >      >      >
 >      >      > If I'm not mistaken, this patch is related with:
 >      >      >
 >      >
 >
https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html

 >   
  >

 >      >
 > 
   >>

 >      >      >
 >      >
 > 
   >

 >      >
 > 
         >      > IMHO, ideally, virtio-gpu and vhost-user-gpu both,
would
 >     use the
 >      >      > infrastructure from the patch I linked to store the
 >      >      > virtio objects, so that they can be later shared with
 >     other devices.
 >      >
 >      >     I don't think such sharing is possible because the
resources are
 >      >     identified by IDs that are local to the device. That also
 >     complicates
 >      >     migration.
 >      >
 >      >     Regards,
 >      >     Akihiko Odaki
 >      >
 >      > Hi Akihiko,
 >      >
 >      > As far as I understand, the feature to export
dma-bufs from the
 >      > virtgpu was introduced as part of the virtio cross-device
sharing
 >      > proposal [1]. Thus, it shall be posible. When
virtgpu ASSING_UUID,
 >      > it exports and identifies the dmabuf resource, so that
when the
 >     dmabuf gets
 >      > shared inside the guest (e.g., with virtio-video), we can
use the
 >     assigned
 >      > UUID to find the dmabuf in the host (using the patch that I
 >     linked above),
 >      > and import it.
 >      >
 >      > [1] - https://lwn.net/Articles/828988/

 >     >

 >     >>
 >
 >     The problem is that virtio-gpu can have other kind of
resources like
 >     pixman and OpenGL textures and manage them and DMA-BUFs with
unified
 >     resource ID.
 >
 >
 > I see.
 >
 >
 >     So you cannot change:
 >     g_hash_table_insert(g->resource_uuids,
 >     GUINT_TO_POINTER(assign.resource_id), uuid);
 >     by:
 >     virtio_add_dmabuf(uuid, assign.resource_id);
 >
 >     assign.resource_id is not DMA-BUF file descriptor, and the
underlying
 >     resource my not be DMA-BUF at first place.
 >
 >
 > I didn't really look into the patch in-depth, so the code was
intended
 > to give an idea of how the implementation would look like with
 > the cross-device patch API. Indeed, it is not the resource_id,
 > (I just took a brief look at the virtio specificacion 1.2), but the
 > underlying
 > resource what we want to use here.
 >
 >
 >     Also, since this lives in the common code that is not used
only by
 >     virtio-gpu-gl but also virtio-gpu, w

Re: [PATCH v11 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-09-14 Thread Bernhard Beschow




Am 14. September 2023 04:38:51 UTC schrieb Gurchetan Singh 
:
>On Wed, Sep 13, 2023 at 4:58 AM Bernhard Beschow  wrote:
>
>>
>>
>> Am 23. August 2023 01:25:38 UTC schrieb Gurchetan Singh <
>> gurchetansi...@chromium.org>:
>> >This adds initial support for gfxstream and cross-domain.  Both
>> >features rely on virtio-gpu blob resources and context types, which
>> >are also implemented in this patch.
>> >
>> >gfxstream has a long and illustrious history in Android graphics
>> >paravirtualization.  It has been powering graphics in the Android
>> >Studio Emulator for more than a decade, which is the main developer
>> >platform.
>> >
>> >Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
>> >The key design characteristic was a 1:1 threading model and
>> >auto-generation, which fit nicely with the OpenGLES spec.  It also
>> >allowed easy layering with ANGLE on the host, which provides the GLES
>> >implementations on Windows or MacOS enviroments.
>> >
>> >gfxstream has traditionally been maintained by a single engineer, and
>> >between 2015 to 2021, the goldfish throne passed to Frank Yang.
>> >Historians often remark this glorious reign ("pax gfxstreama" is the
>> >academic term) was comparable to that of Augustus and both Queen
>> >Elizabeths.  Just to name a few accomplishments in a resplendent
>> >panoply: higher versions of GLES, address space graphics, snapshot
>> >support and CTS compliant Vulkan [b].
>> >
>> >One major drawback was the use of out-of-tree goldfish drivers.
>> >Android engineers didn't know much about DRM/KMS and especially TTM so
>> >a simple guest to host pipe was conceived.
>> >
>> >Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
>> >the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
>> >port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
>> >It was a symbol compatible replacement of virglrenderer [c] and named
>> >"AVDVirglrenderer".  This implementation forms the basis of the
>> >current gfxstream host implementation still in use today.
>> >
>> >cross-domain support follows a similar arc.  Originally conceived by
>> >Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
>> >2018, it initially relied on the downstream "virtio-wl" device.
>> >
>> >In 2020 and 2021, virtio-gpu was extended to include blob resources
>> >and multiple timelines by yours truly, features gfxstream/cross-domain
>> >both require to function correctly.
>> >
>> >Right now, we stand at the precipice of a truly fantastic possibility:
>> >the Android Emulator powered by upstream QEMU and upstream Linux
>> >kernel.  gfxstream will then be packaged properfully, and app
>> >developers can even fix gfxstream bugs on their own if they encounter
>> >them.
>> >
>> >It's been quite the ride, my friends.  Where will gfxstream head next,
>> >nobody really knows.  I wouldn't be surprised if it's around for
>> >another decade, maintained by a new generation of Android graphics
>> >enthusiasts.
>> >
>> >Technical details:
>> >  - Very simple initial display integration: just used Pixman
>> >  - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
>> >calls
>> >
>> >Next steps for Android VMs:
>> >  - The next step would be improving display integration and UI interfaces
>> >with the goal of the QEMU upstream graphics being in an emulator
>> >release [d].
>> >
>> >Next steps for Linux VMs for display virtualization:
>> >  - For widespread distribution, someone needs to package Sommelier or the
>> >wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
>> >versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
>> >which allows disabling KMS hypercalls.  If anyone cares enough, it'll
>> >probably be possible to build a custom VM variant that uses this
>> display
>> >virtualization strategy.
>> >
>> >[a]
>> https://android-review.googlesource.com/c/platform/development/+/34470
>> >[b]
>> https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
>> >[c]
>> https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
>> >[d] https://developer.android.com/studio/releases/emulator
>> >[e] https://github.com/talex5/wayland-proxy-virtwl
>> >
>> >Signed-off-by: Gurchetan Singh 
>> >Tested-by: Alyssa Ross 
>> >Tested-by: Emmanouil Pitsidianakis 
>> >Reviewed-by: Emmanouil Pitsidianakis 
>> >---
>> >v1: Incorported various suggestions by Akihiko Odaki and Bernard Berschow
>> >- Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros
>> >- Used error_report(..)
>> >- Used g_autofree to fix leaks on error paths
>> >- Removed unnecessary casts
>> >- added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files
>> >
>> >v2: Incorported various suggestions by Akihiko Odaki, Marc-André Lureau
>> and
>> >Bernard Berschow:
>> >- Parenthesis in CHECK macro
>> >- CHECK_RESULT(result, ..) --> CHECK(!result, ..)
>> >- dela

Re: [PATCH v4 04/21] hw/cpu: Update the comments of nr_cores and nr_dies

2023-09-14 Thread Philippe Mathieu-Daudé


On 14/9/23 09:21, Zhao Liu wrote:

From: Zhao Liu 

In the nr_threads' comment, specify it represents the
number of threads in the "core" to avoid confusion.

Also add comment for nr_dies in CPUX86State.

Signed-off-by: Zhao Liu 
---
Changes since v3:
  * The new patch split out of CPUSTATE.nr_cores' fix. (Xiaoyao)
---
  include/hw/core/cpu.h | 2 +-
  target/i386/cpu.h | 1 +
  2 files changed, 2 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH v3] hw/cxl: Fix out of bound array access

2023-09-14 Thread Dmitry Frolov

According to cxl_interleave_ways_enc(), fw->num_targets is allowed to be up
to 16. This also corresponds to CXL specs. So, the fw->target_hbs[] array
is iterated from 0 to 15. But it is statically declared of length 8. Thus,
out of bound array access may occur.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

v2: assert added
v3: assert removed

Fixes: c28db9e000 ("hw/pci-bridge: Make PCIe and CXL PXB Devices inherit from 
TYPE_PXB_DEV")
Signed-off-by: Dmitry Frolov 
---
 include/hw/cxl/cxl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 56c9e7676e..4944725849 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -29,7 +29,7 @@ typedef struct PXBCXLDev PXBCXLDev;
 typedef struct CXLFixedWindow {
 uint64_t size;
 char **targets;
-PXBCXLDev *target_hbs[8];
+PXBCXLDev *target_hbs[16];
 uint8_t num_targets;
 uint8_t enc_int_ways;
 uint8_t enc_int_gran;
-- 
2.34.1

Re: [PATCH v4 03/21] softmmu: Fix CPUSTATE.nr_cores' calculation

2023-09-14 Thread Philippe Mathieu-Daudé


Hi,

On 14/9/23 09:21, Zhao Liu wrote:

From: Zhuocheng Ding 

 From CPUState.nr_cores' comment, it represents "number of cores within
this CPU package".

After 003f230e37d7 ("machine: Tweak the order of topology members in
struct CpuTopology"), the meaning of smp.cores changed to "the number of
cores in one die", but this commit missed to change CPUState.nr_cores'
calculation, so that CPUState.nr_cores became wrong and now it
misses to consider numbers of clusters and dies.

At present, only i386 is using CPUState.nr_cores.

But as for i386, which supports die level, the uses of CPUState.nr_cores
are very confusing:

Early uses are based on the meaning of "cores per package" (before die
is introduced into i386), and later uses are based on "cores per die"
(after die's introduction).

This difference is due to that commit a94e1428991f ("target/i386: Add
CPUID.1F generation support for multi-dies PCMachine") misunderstood
that CPUState.nr_cores means "cores per die" when calculated
CPUID.1FH.01H:EBX. After that, the changes in i386 all followed this
wrong understanding.

With the influence of 003f230e37d7 and a94e1428991f, for i386 currently
the result of CPUState.nr_cores is "cores per die", thus the original
uses of CPUState.cores based on the meaning of "cores per package" are
wrong when multiple dies exist:
1. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.01H:EBX[bits 23:16] is
incorrect because it expects "cpus per package" but now the
result is "cpus per die".
2. In cpu_x86_cpuid() of target/i386/cpu.c, for all leaves of CPUID.04H:
EAX[bits 31:26] is incorrect because they expect "cpus per package"
but now the result is "cpus per die". The error not only impacts the
EAX calculation in cache_info_passthrough case, but also impacts other
cases of setting cache topology for Intel CPU according to cpu
topology (specifically, the incoming parameter "num_cores" expects
"cores per package" in encode_cache_cpuid4()).
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.0BH.01H:EBX[bits
15:00] is incorrect because the EBX of 0BH.01H (core level) expects
"cpus per package", which may be different with 1FH.01H (The reason
is 1FH can support more levels. For QEMU, 1FH also supports die,
1FH.01H:EBX[bits 15:00] expects "cpus per die").
4. In cpu_x86_cpuid() of target/i386/cpu.c, when CPUID.8001H is
calculated, here "cpus per package" is expected to be checked, but in
fact, now it checks "cpus per die". Though "cpus per die" also works
for this code logic, this isn't consistent with AMD's APM.
5. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.8008H:ECX expects
"cpus per package" but it obtains "cpus per die".
6. In simulate_rdmsr() of target/i386/hvf/x86_emu.c, in
kvm_rdmsr_core_thread_count() of target/i386/kvm/kvm.c, and in
helper_rdmsr() of target/i386/tcg/sysemu/misc_helper.c,
MSR_CORE_THREAD_COUNT expects "cpus per package" and "cores per
package", but in these functions, it obtains "cpus per die" and
"cores per die".

On the other hand, these uses are correct now (they are added in/after
a94e1428991f):
1. In cpu_x86_cpuid() of target/i386/cpu.c, topo_info.cores_per_die
meets the actual meaning of CPUState.nr_cores ("cores per die").
2. In cpu_x86_cpuid() of target/i386/cpu.c, vcpus_per_socket (in CPUID.
04H's calculation) considers number of dies, so it's correct.
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.1FH.01H:EBX[bits
15:00] needs "cpus per die" and it gets the correct result, and
CPUID.1FH.02H:EBX[bits 15:00] gets correct "cpus per package".

When CPUState.nr_cores is correctly changed to "cores per package" again
, the above errors will be fixed without extra work, but the "currently"
correct cases will go wrong and need special handling to pass correct
"cpus/cores per die" they want.

Fix CPUState.nr_cores' calculation to fit the original meaning "cores
per package", as well as changing calculation of topo_info.cores_per_die,
vcpus_per_socket and CPUID.1FH.


What a pain. Can we split this patch in 2, first the x86 part
and then the common part (softmmu/cpus.c)?


Fixes: a94e1428991f ("target/i386: Add CPUID.1F generation support for multi-dies 
PCMachine")
Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in struct 
CpuTopology")
Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
---
Changes since v3:
  * Describe changes in imperative mood. (Babu)
  * Fix spelling typo. (Babu)
  * Split the comment change into a separate patch. (Xiaoyao)

Changes since v2:
  * Use wrapped helper to get cores per socket in qemu_init_vcpu().

Changes since v1:
  * Add comment for nr_dies in CPUX86State. (Yanan)
---
  softmmu/cpus.c| 2 +-
  target/i386/cpu.c | 9 -
  2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 0848e0dbdb3f..fa8239c217ff 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -624

Re: [PATCH v4 10/21] i386: Introduce module-level cpu topology to CPUX86State

2023-09-14 Thread Philippe Mathieu-Daudé


On 14/9/23 09:21, Zhao Liu wrote:

From: Zhuocheng Ding 

smp command has the "clusters" parameter but x86 hasn't supported that
level. "cluster" is a CPU topology level concept above cores, in which
the cores may share some resources (L2 cache or some others like L3
cache tags, depending on the Archs) [1][2]. For x86, the resource shared
by cores at the cluster level is mainly the L2 cache.

However, using cluster to define x86's L2 cache topology will cause the
compatibility problem:

Currently, x86 defaults that the L2 cache is shared in one core, which
actually implies a default setting "cores per L2 cache is 1" and
therefore implicitly defaults to having as many L2 caches as cores.

For example (i386 PC machine):
-smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)

Considering the topology of the L2 cache, this (*) implicitly means "1
core per L2 cache" and "2 L2 caches per die".

If we use cluster to configure L2 cache topology with the new default
setting "clusters per L2 cache is 1", the above semantics will change
to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
cores per L2 cache".

So the same command (*) will cause changes in the L2 cache topology,
further affecting the performance of the virtual machine.

Therefore, x86 should only treat cluster as a cpu topology level and
avoid using it to change L2 cache by default for compatibility.

"cluster" in smp is the CPU topology level which is between "core" and
die.

For x86, the "cluster" in smp is corresponding to the module level [2],
which is above the core level. So use the "module" other than "cluster"
in i386 code.

And please note that x86 already has a cpu topology level also named
"cluster" [3], this level is at the upper level of the package. Here,
the cluster in x86 cpu topology is completely different from the
"clusters" as the smp parameter. After the module level is introduced,
the cluster as the smp parameter will actually refer to the module level
of x86.

[1]: 864c3b5c32f0 ("hw/core/machine: Introduce CPU cluster topology support")
[2]: Yanan's comment about "cluster",
  https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html
[3]: SDM, vol.3, ch.9, 9.9.1 Hierarchical Mapping of Shared Resources.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Acked-by: Michael S. Tsirkin 
---
Changes since v1:
  * The background of the introduction of the "cluster" parameter and its
exact meaning were revised according to Yanan's explanation. (Yanan)
---
  hw/i386/x86.c | 1 +
  target/i386/cpu.c | 1 +
  target/i386/cpu.h | 5 +
  3 files changed, 7 insertions(+)




diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 470257b92240..556e80f29764 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1903,6 +1903,11 @@ typedef struct CPUArchState {
  
  /* Number of dies within this CPU package. */

  unsigned nr_dies;
+/*
+ * Number of modules within this CPU package.
+ * Module level in x86 cpu topology is corresponding to smp.clusters.
+ */
+unsigned nr_modules;
  } CPUX86State;


It would be really useful to have an ASCII art comment showing
the architecture topology. Also for clarity the topo fields from
CPU[Arch]State could be moved into a 'topo' sub structure, or even
clearer would be to re-use the X86CPUTopoIDs structure?

Re: [PATCH v4 21/21] i386: Add new property to control L2 cache topo in CPUID.04H

2023-09-14 Thread Philippe Mathieu-Daudé


On 14/9/23 09:21, Zhao Liu wrote:

From: Zhao Liu 

The property x-l2-cache-topo will be used to change the L2 cache
topology in CPUID.04H.

Now it allows user to set the L2 cache is shared in core level or
cluster level.

If user passes "-cpu x-l2-cache-topo=[core|cluster]" then older L2 cache
topology will be overrode by the new topology setting.

Here we expose to user "cluster" instead of "module", to be consistent
with "cluster-id" naming.

Since CPUID.04H is used by intel CPUs, this property is available on
intel CPUs as for now.

When necessary, it can be extended to CPUID.801DH for AMD CPUs.

(Tested the cache topology in CPUID[0x04] leaf with "x-l2-cache-topo=[
core|cluster]", and tested the live migration between the QEMUs w/ &
w/o this patch series.)

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Changes since v3:
  * Add description about test for live migration compatibility. (Babu)

Changes since v1:
  * Rename MODULE branch to CPU_TOPO_LEVEL_MODULE to match the previous
renaming changes.
---
  target/i386/cpu.c | 34 +-
  target/i386/cpu.h |  2 ++
  2 files changed, 35 insertions(+), 1 deletion(-)




@@ -8079,6 +8110,7 @@ static Property x86_cpu_properties[] = {
   false),
  DEFINE_PROP_BOOL("x-intel-pt-auto-level", X86CPU, intel_pt_auto_level,
   true),
+DEFINE_PROP_STRING("x-l2-cache-topo", X86CPU, l2_cache_topo_level),


We use the 'x-' prefix for unstable features, is it the case here?


  DEFINE_PROP_END_OF_LIST()
  };

Re: [PATCH 4/5] elf2dmp: use Linux mmap with MAP_NORESERVE when possible

2023-09-14 Thread Akihiko Odaki


On 2023/09/14 7:46, Viktor Prutyanov wrote:

Glib's g_mapped_file_new maps file with PROT_READ|PROT_WRITE and
MAP_PRIVATE. This leads to premature physical memory allocation of dump
file size on Linux hosts and may fail. On Linux, mapping the file with
MAP_NORESERVE limits the allocation by available memory.

Signed-off-by: Viktor Prutyanov 
---
  contrib/elf2dmp/qemu_elf.c | 66 +++---
  contrib/elf2dmp/qemu_elf.h |  4 +++
  2 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/contrib/elf2dmp/qemu_elf.c b/contrib/elf2dmp/qemu_elf.c
index ebda60dcb8..94a8c3ad15 100644
--- a/contrib/elf2dmp/qemu_elf.c
+++ b/contrib/elf2dmp/qemu_elf.c
@@ -165,10 +165,37 @@ static bool check_ehdr(QEMU_Elf *qe)
  return true;
  }
  
-int QEMU_Elf_init(QEMU_Elf *qe, const char *filename)

+static int QEMU_Elf_map(QEMU_Elf *qe, const char *filename)
  {
+#ifdef CONFIG_LINUX


Here CONFIG_LINUX is used while qemu_elf.h uses CONFIG_POSIX.
I also wonder if GLib implementation is really necessary.


+struct stat st;
+
+printf("Using Linux's mmap\n");
+
+qe->fd = open(filename, O_RDONLY, 0);
+if (qe->fd == -1) {
+eprintf("Failed to open ELF dump file \'%s\'\n", filename);
+return 1;
+}
+
+if (fstat(qe->fd, &st)) {
+eprintf("Failed to get size of ELF dump file\n");
+close(qe->fd);
+return 1;
+}
+qe->size = st.st_size;
+
+qe->map = mmap(NULL, qe->size, PROT_READ | PROT_WRITE,
+MAP_PRIVATE | MAP_NORESERVE, qe->fd, 0);


It should be possible to close the file immediately after mmap().

Re: [PATCH v5 3/3] hw/nvme: add nvme management interface model

2023-09-14 Thread Andrew Jeffery

Hi Klaus,

On Thu, 2023-09-14 at 08:51 +0200, Klaus Jensen wrote:
> On Sep 12 13:50, Andrew Jeffery wrote:
> > Hi Klaus,
> > 
> > On Tue, 2023-09-05 at 10:38 +0200, Klaus Jensen wrote:
> > > > 
> > > > +static void nmi_handle_mi_config_get(NMIDevice *nmi, NMIRequest
> > > > *request)
> > > > +{
> > > > +    uint32_t dw0 = le32_to_cpu(request->dw0);
> > > > +    uint8_t identifier = FIELD_EX32(dw0,
> > > > NMI_CMD_CONFIGURATION_GET_DW0,
> > > > +    IDENTIFIER);
> > > > +    const uint8_t *buf;
> > > > +
> > > > +    static const uint8_t smbus_freq[4] = {
> > > > +    0x00,   /* success */
> > > > +    0x01, 0x00, 0x00,   /* 100 kHz */
> > > > +    };
> > > > +
> > > > +    static const uint8_t mtu[4] = {
> > > > +    0x00,   /* success */
> > > > +    0x40, 0x00, /* 64 */
> > > > +    0x00,   /* reserved */
> > > > +    };
> > > > +
> > > > +    trace_nmi_handle_mi_config_get(identifier);
> > > > +
> > > > +    switch (identifier) {
> > > > +    case NMI_CMD_CONFIGURATION_GET_SMBUS_FREQ:
> > > > +    buf = smbus_freq;
> > > > +    break;
> > > > +
> > > > +    case NMI_CMD_CONFIGURATION_GET_MCTP_TRANSMISSION_UNIT:
> > > > +    buf = mtu;
> > > > +    break;
> > > > +
> > > > +    default:
> > > > +    nmi_set_parameter_error(nmi, 0x0, offsetof(NMIRequest,
> > > > dw0));
> > > > +    return;
> > > > +    }
> > > > +
> > > > +    nmi_scratch_append(nmi, buf, sizeof(buf));
> > > > +}
> > 
> > When I tried to build this patch I got:
> > 
> > ```
> > In file included from /usr/include/string.h:535,
> >  from 
> > /home/andrew/src/qemu.org/qemu/include/qemu/osdep.h:112,
> >  from ../hw/nvme/nmi-i2c.c:12:
> > In function ‘memcpy’,
> > inlined from ‘nmi_scratch_append’ at ../hw/nvme/nmi-i2c.c:80:5,
> > inlined from ‘nmi_handle_mi_config_get’ at ../hw/nvme/nmi-i2c.c:246:5,
> > inlined from ‘nmi_handle_mi’ at ../hw/nvme/nmi-i2c.c:266:9,
> > inlined from ‘nmi_handle’ at ../hw/nvme/nmi-i2c.c:313:9:
> > /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: error: 
> > ‘__builtin_memcpy’ forming offset [4, 7] is out of the bounds [0, 4] 
> > [-Werror=array-bounds=]
> >29 |   return __builtin___memcpy_chk (__dest, __src, __len,
> >   |  ^
> >30 |  __glibc_objsize0 (__dest));
> >   |  ~~
> > ```
> > 
> > It wasn't clear initially from the error that the source of the problem
> > was the size associated with the source buffer, especially as there is
> > some pointer arithmetic being done to derive `__dest`.
> > 
> > Anyway, what we're trying to express is that the size to copy from buf
> > is the size of the array pointed to by buf. However, buf is declared as
> > a pointer to uint8_t, which loses the length information. To fix that I
> > think we need:
> > 
> > - const uint8_t *buf;
> > + const uint8_t (*buf)[4];
> > 
> > and then:
> > 
> > - nmi_scratch_append(nmi, buf, sizeof(buf));
> > + nmi_scratch_append(nmi, buf, sizeof(*buf));
> > 
> > Andrew
> > 
> 
> Hi Andrew,
> 
> Nice (and important) catch! Just curious, are you massaging QEMU's build
> system into adding additional checks or how did your compiler catch
> this?

No tricks to be honest, I just applied your patches on top of
9ef497755afc ("Merge tag 'pull-vfio-20230911' of
https://github.com/legoater/qemu into staging") using `b4 shazam ...`.

I'm building on Debian Bookworm:

$ gcc --version | head -n1
gcc (Debian 12.2.0-14) 12.2.0

Andrew

[PATCH v3 2/3] linux-user/syscall.c: do_ppoll: consolidate and fix the forgotten unlock_user

2023-09-14 Thread Michael Tokarev

in do_ppoll we've one place where unlock_user isn't done
at all, while other places use 0 for the size of the area
being unlocked instead of the actual size.

Instead of open-coding calls to unlock_user(), jump to the
end of this function and do a single call to unlock there.

Note: original code calls unlock_user() with target_pfd being
NULL in one case (when nfds == 0).   Move initializers to
variable declarations, - I wondered a few times if target_pfd
isn't being initialized at all for unlock_user.

Signed-off-by: Michael Tokarev 
---
 linux-user/syscall.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 33bf84c205..eabdf50abc 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1487,14 +1487,12 @@ static abi_long do_pselect6(abi_long arg1, abi_long 
arg2, abi_long arg3,
 static abi_long do_ppoll(abi_long arg1, abi_long arg2, abi_long arg3,
  abi_long arg4, abi_long arg5, bool ppoll, bool time64)
 {
-struct target_pollfd *target_pfd;
+struct target_pollfd *target_pfd = NULL;
 unsigned int nfds = arg2;
-struct pollfd *pfd;
+struct pollfd *pfd = NULL;
 unsigned int i;
 abi_long ret;
 
-pfd = NULL;
-target_pfd = NULL;
 if (nfds) {
 if (nfds > (INT_MAX / sizeof(struct target_pollfd))) {
 return -TARGET_EINVAL;
@@ -1519,8 +1517,8 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
 if (time64
 ? target_to_host_timespec64(timeout_ts, arg3)
 : target_to_host_timespec(timeout_ts, arg3)) {
-unlock_user(target_pfd, arg1, 0);
-return -TARGET_EFAULT;
+ret = -TARGET_EFAULT;
+goto out;
 }
 } else {
 timeout_ts = NULL;
@@ -1529,8 +1527,7 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
 if (arg4) {
 ret = process_sigsuspend_mask(&set, arg4, arg5);
 if (ret != 0) {
-unlock_user(target_pfd, arg1, 0);
-return ret;
+goto out;
 }
 }
 
@@ -1544,7 +1541,8 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
 if (time64
 ? host_to_target_timespec64(arg3, timeout_ts)
 : host_to_target_timespec(arg3, timeout_ts)) {
-return -TARGET_EFAULT;
+ret = -TARGET_EFAULT;
+goto out;
 }
 }
 } else {
@@ -1567,6 +1565,8 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
 target_pfd[i].revents = tswap16(pfd[i].revents);
 }
 }
+
+out:
 unlock_user(target_pfd, arg1, sizeof(struct target_pollfd) * nfds);
 return ret;
 }
-- 
2.39.2

[PATCH v3 3/3] linux-user/syscall.c: do_ppoll: eliminate large alloca

2023-09-14 Thread Michael Tokarev

do_ppoll() in linux-user/syscall.c uses alloca() to allocate
an array of struct pullfds on the stack.  The only upper
boundary for number of entries for this array is so that
whole thing fits in INT_MAX.  This is definitely too much
for stack allocation.

Use heap allocation when large number of entries is requested
(currently 32, arbitrary), and continue to use alloca() for
smaller allocations, to optimize small operations for small
sizes.  The code for this optimization is small, I see no
reason for dropping it.

This eliminates last large user-controlled on-stack allocation
from syscall.c.

Signed-off-by: Michael Tokarev 
---
 linux-user/syscall.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index eabdf50abc..1dbe28eba4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1489,7 +1489,7 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
 {
 struct target_pollfd *target_pfd = NULL;
 unsigned int nfds = arg2;
-struct pollfd *pfd = NULL;
+struct pollfd *pfd = NULL, *heap_pfd = NULL;
 unsigned int i;
 abi_long ret;
 
@@ -1503,7 +1503,17 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
 return -TARGET_EFAULT;
 }
 
-pfd = alloca(sizeof(struct pollfd) * nfds);
+/* arbitrary "small" number to limit stack usage */
+if (nfds <= 64) {
+pfd = alloca(sizeof(struct pollfd) * nfds);
+} else {
+heap_pfd = g_try_new(struct pollfd, nfds);
+if (!heap_pfd) {
+ret = -TARGET_ENOMEM;
+goto out;
+}
+pfd = heap_pfd;
+}
 for (i = 0; i < nfds; i++) {
 pfd[i].fd = tswap32(target_pfd[i].fd);
 pfd[i].events = tswap16(target_pfd[i].events);
@@ -1567,6 +1577,7 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
 }
 
 out:
+g_free(heap_pfd);
 unlock_user(target_pfd, arg1, sizeof(struct target_pollfd) * nfds);
 return ret;
 }
-- 
2.39.2

[PATCH v3 0/3] linux-user/syscall.c: do_ppoll: eliminate large alloca

2023-09-14 Thread Michael Tokarev

This is a v3 patch (now patchset) which eliminates guest-controlled
alloca from linux-user:poll.  I now split out 2 unrelated preparational
changes into its own patches, for easy review.

The small optmization which were here in v1 is still there.  In huge
number of use cases, poll() et all are called with just one file
descriptor, there's no need to use heap allocation for this, and the
code to avoid this heap allocation is already there, is short and is
easy to read too (YMMV).

This patchset passes poll- and ppoll-related LTSP tests, except of
the ppoll_time64 case (tested on ppc64):

ppoll01.c:174: TCONF: syscall(414) __NR_ppoll_time64 not supported on your arch

which was here before.

Michael Tokarev (3):
  linux-user/syscall.c: do_ppoll: simplify time64 host<=>target
conversion expressions
  linux-user/syscall.c: do_ppoll: consolidate and fix the forgotten
unlock_user
  linux-user/syscall.c: do_ppoll: eliminate large alloca

 linux-user/syscall.c | 52 +++-
 1 file changed, 27 insertions(+), 25 deletions(-)

-- 
2.39.2

[PATCH v3 1/3] linux-user/syscall.c: do_ppoll: simplify time64 host<=>target conversion expressions

2023-09-14 Thread Michael Tokarev

replace if (time64) { time64-expr } else { time32-expr }
expressions which are difficult to read with a much shorter
and easier to read arithmetic-if constructs.

Signed-off-by: Michael Tokarev 
---
 linux-user/syscall.c | 27 +--
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 3521a2d70b..33bf84c205 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1516,16 +1516,11 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
 sigset_t *set = NULL;
 
 if (arg3) {
-if (time64) {
-if (target_to_host_timespec64(timeout_ts, arg3)) {
-unlock_user(target_pfd, arg1, 0);
-return -TARGET_EFAULT;
-}
-} else {
-if (target_to_host_timespec(timeout_ts, arg3)) {
-unlock_user(target_pfd, arg1, 0);
-return -TARGET_EFAULT;
-}
+if (time64
+? target_to_host_timespec64(timeout_ts, arg3)
+: target_to_host_timespec(timeout_ts, arg3)) {
+unlock_user(target_pfd, arg1, 0);
+return -TARGET_EFAULT;
 }
 } else {
 timeout_ts = NULL;
@@ -1546,14 +1541,10 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
 finish_sigsuspend_mask(ret);
 }
 if (!is_error(ret) && arg3) {
-if (time64) {
-if (host_to_target_timespec64(arg3, timeout_ts)) {
-return -TARGET_EFAULT;
-}
-} else {
-if (host_to_target_timespec(arg3, timeout_ts)) {
-return -TARGET_EFAULT;
-}
+if (time64
+? host_to_target_timespec64(arg3, timeout_ts)
+: host_to_target_timespec(arg3, timeout_ts)) {
+return -TARGET_EFAULT;
 }
 }
 } else {
-- 
2.39.2

Re: [PATCH] vhost: Add a defensive check in vhost_commit against wrong deallocation

2023-09-14 Thread Eric Auger

Hi Jason,

On 9/14/23 05:46, Jason Wang wrote:
> On Wed, Sep 13, 2023 at 3:47 PM Eric Auger  wrote:
>> In vhost_commit(), it may happen that dev->mem_sections and
>> dev->tmp_sections are equal, in which case, unconditionally
>> freeing old_sections at the end of the function will also free
>> dev->mem_sections used on subsequent call leading to a segmentation
>> fault.
>>
>> Check this situation before deallocating memory.
>>
>> Signed-off-by: Eric Auger 
>> Fixes: c44317efecb2 ("vhost: Build temporary section list and deref
>> after commit")
>> CC: QEMU Stable 
>>
>> ---
>>
>> This SIGSEV condition can be reproduced with
>> https://lore.kernel.org/all/20230904080451.424731-1-eric.au...@redhat.com/#r
>> This is most probably happening in a situation where the memory API is
>> used in a wrong manner but well.
> Any chance to move this to the memory API or we may end up with things
> like this in another listener?

I am not very familiar with the vhost code but aren't those tmp_sections
and mem_sections really specific to the vhost device? I am not sure we
can easily generalize.

Thanks

Eric
>
> Thanks
>
>> ---
>>  hw/virtio/vhost.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index e2f6ffb446..c02c599ef0 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -545,6 +545,11 @@ static void vhost_commit(MemoryListener *listener)
>>  dev->mem_sections = dev->tmp_sections;
>>  dev->n_mem_sections = dev->n_tmp_sections;
>>
>> +if (old_sections == dev->mem_sections) {
>> +assert(n_old_sections ==  dev->n_mem_sections);
>> +return;
>> +}
>> +
>>  if (dev->n_mem_sections != n_old_sections) {
>>  changed = true;
>>  } else {
>> --
>> 2.41.0
>>

Re: [PATCH 3/5] elf2dmp: introduce merging of physical memory runs

2023-09-14 Thread Akihiko Odaki


On 2023/09/14 7:46, Viktor Prutyanov wrote:

DMP supports 42 physical memory runs at most. So, merge adjacent
physical memory ranges from QEMU ELF when possible to minimize total
number of runs.

Signed-off-by: Viktor Prutyanov 
---
  contrib/elf2dmp/main.c | 56 --
  1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/contrib/elf2dmp/main.c b/contrib/elf2dmp/main.c
index b7e3930164..9ef5cfcd23 100644
--- a/contrib/elf2dmp/main.c
+++ b/contrib/elf2dmp/main.c
@@ -20,6 +20,7 @@
  #define PE_NAME "ntoskrnl.exe"
  
  #define INITIAL_MXCSR   0x1f80

+#define MAX_NUMBER_OF_RUNS  42
  
  typedef struct idt_desc {

  uint16_t offset1;   /* offset bits 0..15 */
@@ -234,6 +235,42 @@ static int fix_dtb(struct va_space *vs, QEMU_Elf *qe)
  return 1;
  }
  
+static void try_merge_runs(struct pa_space *ps,

+WinDumpPhyMemDesc64 *PhysicalMemoryBlock)
+{
+unsigned int merge_cnt = 0, run_idx = 0;
+
+PhysicalMemoryBlock->NumberOfRuns = 0;
+
+for (unsigned int idx = 0; idx < ps->block_nr; idx++) {
+struct pa_block *blk = ps->block + idx;
+struct pa_block *next = blk + 1;
+
+PhysicalMemoryBlock->NumberOfPages += blk->size / ELF2DMP_PAGE_SIZE;
+
+if (idx + 1 != ps->block_nr && blk->paddr + blk->size == next->paddr) {
+printf("Block #%u 0x%"PRIx64"+:0x%"PRIx64" and %u previous will be 
"
+"merged\n", idx, blk->paddr, blk->size, merge_cnt);
+merge_cnt++;
+} else {
+struct pa_block *first_merged = blk - merge_cnt;
+
+printf("Block #%u 0x%"PRIx64"+:0x%"PRIx64" and %u previous will be 
"
+"merged to 0x%"PRIx64"+:0x%"PRIx64" and saved as run 
#%u\n",
+idx, blk->paddr, blk->size, merge_cnt, first_merged->paddr,
+blk->paddr + blk->size - first_merged->paddr, run_idx);
+PhysicalMemoryBlock->Run[run_idx] = (WinDumpPhyMemRun64) {
+.BasePage = first_merged->paddr / ELF2DMP_PAGE_SIZE,
+.PageCount = (blk->paddr + blk->size - first_merged->paddr) /
+ELF2DMP_PAGE_SIZE,
+};
+PhysicalMemoryBlock->NumberOfRuns++;
+run_idx++;
+merge_cnt = 0;
+}
+}
+}
+
  static int fill_header(WinDumpHeader64 *hdr, struct pa_space *ps,
  struct va_space *vs, uint64_t KdDebuggerDataBlock,
  KDDEBUGGER_DATA64 *kdbg, uint64_t KdVersionBlock, int nr_cpus)
@@ -244,7 +281,6 @@ static int fill_header(WinDumpHeader64 *hdr, struct 
pa_space *ps,
  KUSD_OFFSET_PRODUCT_TYPE);
  DBGKD_GET_VERSION64 kvb;
  WinDumpHeader64 h;
-size_t i;
  
  QEMU_BUILD_BUG_ON(KUSD_OFFSET_SUITE_MASK >= ELF2DMP_PAGE_SIZE);

  QEMU_BUILD_BUG_ON(KUSD_OFFSET_PRODUCT_TYPE >= ELF2DMP_PAGE_SIZE);
@@ -282,13 +318,17 @@ static int fill_header(WinDumpHeader64 *hdr, struct 
pa_space *ps,
  .RequiredDumpSpace = sizeof(h),
  };
  
-for (i = 0; i < ps->block_nr; i++) {

-h.PhysicalMemoryBlock.NumberOfPages +=
-ps->block[i].size / ELF2DMP_PAGE_SIZE;
-h.PhysicalMemoryBlock.Run[i] = (WinDumpPhyMemRun64) {
-.BasePage = ps->block[i].paddr / ELF2DMP_PAGE_SIZE,
-.PageCount = ps->block[i].size / ELF2DMP_PAGE_SIZE,
-};
+if (h.PhysicalMemoryBlock.NumberOfRuns <= MAX_NUMBER_OF_RUNS) {
+for (unsigned int idx = 0; idx < ps->block_nr; idx++) {


I suggest keep it size_t since that's the type of ps->block_nr. It's 
somewhat annoying typing something long like "unsigned int" too.

Re: [PATCH 5/5] elf2dmp: rework PDB_STREAM_INDEXES::segments obtaining

2023-09-14 Thread Akihiko Odaki


On 2023/09/14 7:46, Viktor Prutyanov wrote:

PDB for Windows 11 kernel has slightly different structure compared to
previous versions. Since elf2dmp don't use the other fields, copy only
'segments' field from PDB_STREAM_INDEXES.


I suggest replacing the sidx member of struct pdb_reader with a single 
uint16_t to save some space and prevent accidentally introducing 
references to other members.

Re: [PATCH v2 08/20] asc: generate silence if FIFO empty but engine still running

2023-09-14 Thread Philippe Mathieu-Daudé




On 9/9/23 11:48, Mark Cave-Ayland wrote:

MacOS (un)helpfully leaves the FIFO engine running even when all the samples 
have
been written to the hardware, and expects the FIFO status flags and IRQ to be
updated continuously.

There is an additional problem in that not all audio backends guarantee an
all-zero output when there is no FIFO data available, in particular the Windows
dsound backend which re-uses its internal circular buffer causing the last 
played
sound to loop indefinitely.

Whilst this is effectively a bug in the Windows dsound backend, work around it
for now using a simple heuristic: if the FIFO remains empty for half a cycle
(~23ms) then continuously fill the generated buffer with empty silence.

Signed-off-by: Mark Cave-Ayland 
---
  hw/audio/asc.c | 19 +++
  include/hw/audio/asc.h |  2 ++
  2 files changed, 21 insertions(+)

diff --git a/hw/audio/asc.c b/hw/audio/asc.c
index 336ace0cd6..b01b285512 100644
--- a/hw/audio/asc.c
+++ b/hw/audio/asc.c
@@ -334,6 +334,21 @@ static void asc_out_cb(void *opaque, int free_b)
  }
  
  if (!generated) {

+/* Workaround for audio underflow bug on Windows dsound backend */
+int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+int silent_samples = muldiv64(now - s->fifo_empty_ns,
+  NANOSECONDS_PER_SECOND, ASC_FREQ);
+
+if (silent_samples > ASC_FIFO_CYCLE_TIME / 2) {
+/*
+ * No new FIFO data within half a cycle time (~23ms) so fill the
+ * entire available buffer with silence. This prevents an issue
+ * with the Windows dsound backend whereby the sound appears to
+ * loop because the FIFO has run out of data, and the driver
+ * reuses the stale content in its circular audio buffer.
+ */
+AUD_write(s->voice, s->silentbuf, samples << s->shift);
+}
  return;
  }


What about having audio_callback_fn returning a boolean, and using
a flag in backends for that silence case? Roughtly:

-- >8 --
diff --git a/audio/audio.h b/audio/audio.h
index 01bdc567fb..4844771c92 100644
--- a/audio/audio.h
+++ b/audio/audio.h
@@ -30,7 +30,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"

-typedef void (*audio_callback_fn) (void *opaque, int avail);
+typedef bool (*audio_callback_fn) (void *opaque, int avail);

 #if HOST_BIG_ENDIAN
 #define AUDIO_HOST_ENDIANNESS 1
diff --git a/audio/audio.c b/audio/audio.c
index 90c7c49d11..5b6e69fbd6 100644
--- a/audio/audio.c
+++ b/audio/audio.c
@@ -1178,8 +1178,11 @@ static void audio_run_out (AudioState *s)
 if (free > sw->resample_buf.pos) {
 free = MIN(free, sw->resample_buf.size)
- sw->resample_buf.pos;
-sw->callback.fn(sw->callback.opaque,
-free * sw->info.bytes_per_frame);
+if (!sw->callback.fn(sw->callback.opaque,
+ free * sw->info.bytes_per_frame)
+&& unlikely(hw->silentbuf_required)) {
+/* write silence ... */
+}
 }
 }
 }
---

[RFC PATCH v2 2/2] hw/riscv: hart: allow other cpu instance

2023-09-14 Thread Nikita Shubin

From: Nikita Shubin 

Allow using instances derivative from RISCVCPU

Signed-off-by: Nikita Shubin 
---
 hw/riscv/riscv_hart.c | 20 
 include/hw/riscv/riscv_hart.h |  2 +-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/riscv/riscv_hart.c b/hw/riscv/riscv_hart.c
index 613ea2aaa0..020ba18e8b 100644
--- a/hw/riscv/riscv_hart.c
+++ b/hw/riscv/riscv_hart.c
@@ -43,24 +43,28 @@ static void riscv_harts_cpu_reset(void *opaque)
 }
 
 static bool riscv_hart_realize(RISCVHartArrayState *s, int idx,
-   char *cpu_type, Error **errp)
+   char *cpu_type, size_t size, Error **errp)
 {
-object_initialize_child(OBJECT(s), "harts[*]", &s->harts[idx], cpu_type);
-qdev_prop_set_uint64(DEVICE(&s->harts[idx]), "resetvec", s->resetvec);
-s->harts[idx].env.mhartid = s->hartid_base + idx;
-qemu_register_reset(riscv_harts_cpu_reset, &s->harts[idx]);
-return qdev_realize(DEVICE(&s->harts[idx]), NULL, errp);
+RISCVCPU *hart = s->harts[idx];
+object_initialize_child_internal(OBJECT(s), "harts[*]",
+hart, size, cpu_type);
+qdev_prop_set_uint64(DEVICE(hart), "resetvec", s->resetvec);
+hart->env.mhartid = s->hartid_base + idx;
+qemu_register_reset(riscv_harts_cpu_reset, hart);
+return qdev_realize(DEVICE(hart), NULL, errp);
 }
 
 static void riscv_harts_realize(DeviceState *dev, Error **errp)
 {
 RISCVHartArrayState *s = RISCV_HART_ARRAY(dev);
+size_t size = object_type_get_instance_size(s->cpu_type);
 int n;
 
-s->harts = g_new0(RISCVCPU, s->num_harts);
+s->harts = g_new0(RISCVCPU *, s->num_harts);
 
 for (n = 0; n < s->num_harts; n++) {
-if (!riscv_hart_realize(s, n, s->cpu_type, errp)) {
+s->harts[n] = RISCV_CPU(object_new(s->cpu_type));
+if (!riscv_hart_realize(s, n, s->cpu_type, size, errp)) {
 return;
 }
 }
diff --git a/include/hw/riscv/riscv_hart.h b/include/hw/riscv/riscv_hart.h
index 912b4a2682..5f6ef06411 100644
--- a/include/hw/riscv/riscv_hart.h
+++ b/include/hw/riscv/riscv_hart.h
@@ -38,7 +38,7 @@ struct RISCVHartArrayState {
 uint32_t hartid_base;
 char *cpu_type;
 uint64_t resetvec;
-RISCVCPU *harts;
+RISCVCPU **harts;
 };
 
 #endif
-- 
2.39.2

[RFC PATCH v2 0/2] hw/riscv: hart: allow other cpu instance

2023-09-14 Thread Nikita Shubin

From: Nikita Shubin 

Currently it is not possible to overload instance of RISCVCPU, 
i.e. something like this:

static const TypeInfo riscv_cpu_type_infos[] = {
 {
.name = TYPE_ANOTHER_RISCV_CPU,
.parent = TYPE_RISCV_CPU,
.instance_size = sizeof(MyCPUState),
.instance_init = riscv_my_cpu_init,
}
};

Because we have RISCVHartArrayState.harts with exactly 
the size of RISCVCPU.

Using own instances can be used to store some internal hart state.

Cc: Daniel Henrique Barboza 
Link: 
https://patchwork.kernel.org/project/qemu-devel/patch/20230727080545.7908-1-nikita.shu...@maquefel.me/

Nikita Shubin (2):
  hw/riscv: hart: replace array access with qemu_get_cpu()
  hw/riscv: hart: allow other cpu instance

 hw/riscv/boot.c   |  6 --
 hw/riscv/riscv_hart.c | 20 
 hw/riscv/sifive_u.c   |  7 +--
 hw/riscv/spike.c  | 17 ++---
 hw/riscv/virt-acpi-build.c|  2 +-
 hw/riscv/virt.c   | 17 +
 include/hw/riscv/riscv_hart.h |  2 +-
 7 files changed, 42 insertions(+), 29 deletions(-)

-- 
2.39.2

[RFC PATCH v2 1/2] hw/riscv: hart: replace array access with qemu_get_cpu()

2023-09-14 Thread Nikita Shubin

From: Nikita Shubin 

Replace all RISCVHartArrayState->harts[idx] with
qemu_get_cpu()/cpu_by_arch_id().

cpu_index is guaranteed to be continuus by cpu_get_free_index(), so they
can be accessed in same order they were added.

"Hart IDs might not necessarily be numbered contiguously in a
multiprocessor system, but at least one hart must have
a hart ID of zero."

This states that hart ID zero should always be present, this makes using
cpu_by_arch_id(0) safe.

Signed-off-by: Nikita Shubin 
---
 hw/riscv/boot.c|  6 --
 hw/riscv/sifive_u.c|  7 +--
 hw/riscv/spike.c   | 17 ++---
 hw/riscv/virt-acpi-build.c |  2 +-
 hw/riscv/virt.c| 17 +
 5 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index 52bf8e67de..041f966e58 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -36,7 +36,8 @@
 
 bool riscv_is_32bit(RISCVHartArrayState *harts)
 {
-return harts->harts[0].env.misa_mxl_max == MXL_RV32;
+RISCVCPU *hart = RISCV_CPU(cpu_by_arch_id(0));
+return hart->env.misa_mxl_max == MXL_RV32;
 }
 
 /*
@@ -385,6 +386,7 @@ void riscv_setup_rom_reset_vec(MachineState *machine, 
RISCVHartArrayState *harts
uint64_t fdt_load_addr)
 {
 int i;
+RISCVCPU *hart = RISCV_CPU(cpu_by_arch_id(0));
 uint32_t start_addr_hi32 = 0x;
 uint32_t fdt_load_addr_hi32 = 0x;
 
@@ -414,7 +416,7 @@ void riscv_setup_rom_reset_vec(MachineState *machine, 
RISCVHartArrayState *harts
 reset_vec[4] = 0x0182b283;   /* ld t0, 24(t0) */
 }
 
-if (!harts->harts[0].cfg.ext_icsr) {
+if (!hart->cfg.ext_icsr) {
 /*
  * The Zicsr extension has been disabled, so let's ensure we don't
  * run the CSR instruction. Let's fill the address with a non
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index ec76dce6c9..3d09d0ee0e 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -168,6 +168,7 @@ static void create_fdt(SiFiveUState *s, const MemMapEntry 
*memmap,
 qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
 
 for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
+RISCVCPU *hart;
 int cpu_phandle = phandle++;
 nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
 char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
@@ -180,9 +181,11 @@ static void create_fdt(SiFiveUState *s, const MemMapEntry 
*memmap,
 } else {
 qemu_fdt_setprop_string(fdt, nodename, "mmu-type", 
"riscv,sv48");
 }
-isa = riscv_isa_string(&s->soc.u_cpus.harts[cpu - 1]);
+hart = RISCV_CPU(qemu_get_cpu(cpu - 1));
+isa = riscv_isa_string(hart);
 } else {
-isa = riscv_isa_string(&s->soc.e_cpus.harts[0]);
+hart = RISCV_CPU(qemu_get_cpu(0));
+isa = riscv_isa_string(hart);
 }
 qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
 qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 81f7e53aed..f3ec6427a1 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -97,29 +97,32 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
 
 for (socket = (riscv_socket_count(ms) - 1); socket >= 0; socket--) {
+uint32_t num_harts = s->soc[socket].num_harts;
+uint32_t hartid_base = s->soc[socket].hartid_base;
+
 clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
 qemu_fdt_add_subnode(fdt, clust_name);
 
-clint_cells =  g_new0(uint32_t, s->soc[socket].num_harts * 4);
+clint_cells =  g_new0(uint32_t, num_harts * 4);
 
-for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
+for (cpu = num_harts - 1; cpu >= 0; cpu--) {
+int cpu_index = hartid_base + cpu;
+RISCVCPU *hart = RISCV_CPU(qemu_get_cpu(cpu_index));
 cpu_phandle = phandle++;
 
-cpu_name = g_strdup_printf("/cpus/cpu@%d",
-s->soc[socket].hartid_base + cpu);
+cpu_name = g_strdup_printf("/cpus/cpu@%d", cpu_index);
 qemu_fdt_add_subnode(fdt, cpu_name);
 if (is_32_bit) {
 qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", 
"riscv,sv32");
 } else {
 qemu_fdt_setprop_string(fdt, cpu_name, "mmu-type", 
"riscv,sv48");
 }
-name = riscv_isa_string(&s->soc[socket].harts[cpu]);
+name = riscv_isa_string(hart);
 qemu_fdt_setprop_string(fdt, cpu_name, "riscv,isa", name);
 g_free(name);
 qemu_fdt_setprop_string(fdt, cpu_name, "compatible", "riscv");
 qemu_fdt_setprop_string(fdt, cpu_name, "status", "okay");
-qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
-s->so

Re: [PATCH v4 1/2] tests: bump libvirt-ci for libasan and libxdp

2023-09-14 Thread Daniel P . Berrangé

On Wed, Sep 13, 2023 at 08:34:36PM +0200, Ilya Maximets wrote:
> This pulls in the fixes for libasan version as well as support for
> libxdp that will be used for af-xdp netdev in the next commits.
> 
> Signed-off-by: Ilya Maximets 
> ---
>  tests/docker/dockerfiles/debian-amd64-cross.docker   | 2 +-
>  tests/docker/dockerfiles/debian-amd64.docker | 2 +-
>  tests/docker/dockerfiles/debian-arm64-cross.docker   | 2 +-
>  tests/docker/dockerfiles/debian-armel-cross.docker   | 2 +-
>  tests/docker/dockerfiles/debian-armhf-cross.docker   | 2 +-
>  tests/docker/dockerfiles/debian-ppc64el-cross.docker | 2 +-
>  tests/docker/dockerfiles/debian-s390x-cross.docker   | 2 +-
>  tests/docker/dockerfiles/opensuse-leap.docker| 2 +-
>  tests/docker/dockerfiles/ubuntu2004.docker   | 2 +-
>  tests/docker/dockerfiles/ubuntu2204.docker   | 2 +-
>  tests/lcitool/libvirt-ci | 2 +-
>  11 files changed, 11 insertions(+), 11 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v4 2/2] net: add initial support for AF_XDP network backend

2023-09-14 Thread Daniel P . Berrangé

On Wed, Sep 13, 2023 at 08:34:37PM +0200, Ilya Maximets wrote:
> AF_XDP is a network socket family that allows communication directly
> with the network device driver in the kernel, bypassing most or all
> of the kernel networking stack.  In the essence, the technology is
> pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
> and works with any network interfaces without driver modifications.
> Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
> require access to character devices or unix sockets.  Only access to
> the network interface itself is necessary.
> 
> This patch implements a network backend that communicates with the
> kernel by creating an AF_XDP socket.  A chunk of userspace memory
> is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
> Fill and Completion) are placed in that memory along with a pool of
> memory buffers for the packet data.  Data transmission is done by
> allocating one of the buffers, copying packet data into it and
> placing the pointer into Tx ring.  After transmission, device will
> return the buffer via Completion ring.  On Rx, device will take
> a buffer form a pre-populated Fill ring, write the packet data into
> it and place the buffer into Rx ring.
> 
> AF_XDP network backend takes on the communication with the host
> kernel and the network interface and forwards packets to/from the
> peer device in QEMU.
> 
> Usage example:
> 
>   -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
>   -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1
> 
> XDP program bridges the socket with a network interface.  It can be
> attached to the interface in 2 different modes:
> 
> 1. skb - this mode should work for any interface and doesn't require
>  driver support.  With a caveat of lower performance.
> 
> 2. native - this does require support from the driver and allows to
> bypass skb allocation in the kernel and potentially use
> zero-copy while getting packets in/out userspace.
> 
> By default, QEMU will try to use native mode and fall back to skb.
> Mode can be forced via 'mode' option.  To force 'copy' even in native
> mode, use 'force-copy=on' option.  This might be useful if there is
> some issue with the driver.
> 
> Option 'queues=N' allows to specify how many device queues should
> be open.  Note that all the queues that are not open are still
> functional and can receive traffic, but it will not be delivered to
> QEMU.  So, the number of device queues should generally match the
> QEMU configuration, unless the device is shared with something
> else and the traffic re-direction to appropriate queues is correctly
> configured on a device level (e.g. with ethtool -N).
> 'start-queue=M' option can be used to specify from which queue id
> QEMU should start configuring 'N' queues.  It might also be necessary
> to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
> for examples.
> 
> In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
> or CAP_BPF capabilities in order to load default XSK/XDP programs to
> the network interface and configure BPF maps.  It is possible, however,
> to run with no capabilities.  For that to work, an external process
> with enough capabilities will need to pre-load default XSK program,
> create AF_XDP sockets and pass their file descriptors to QEMU process
> on startup via 'sock-fds' option.  Network backend will need to be
> configured with 'inhibit=on' to avoid loading of the program.
> QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
> or CAP_IPC_LOCK.
> 
> There are few performance challenges with the current network backends.
> 
> First is that they do not support IO threads.  This means that data
> path is handled by the main thread in QEMU and may slow down other
> work or may be slowed down by some other work.  This also means that
> taking advantage of multi-queue is generally not possible today.
> 
> Another thing is that data path is going through the device emulation
> code, which is not really optimized for performance.  The fastest
> "frontend" device is virtio-net.  But it's not optimized for heavy
> traffic either, because it expects such use-cases to be handled via
> some implementation of vhost (user, kernel, vdpa).  In practice, we
> have virtio notifications and rcu lock/unlock on a per-packet basis
> and not very efficient accesses to the guest memory.  Communication
> channels between backend and frontend devices do not allow passing
> more than one packet at a time as well.
> 
> Some of these challenges can be avoided in the future by adding better
> batching into device emulation or by implementing vhost-af-xdp variant.
> 
> There are also a few kernel limitations.  AF_XDP sockets do not
> support any kinds of checksum or segmentation offloading.  Buffers
> are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
> support implementation for AF_XDP is in progress, but not re

Re: [PULL 00/17] Net patches

2023-09-14 Thread Daniel P . Berrangé

On Wed, Sep 13, 2023 at 08:46:42PM +0200, Ilya Maximets wrote:
> On 9/8/23 16:15, Daniel P. Berrangé wrote:
> > On Fri, Sep 08, 2023 at 04:06:35PM +0200, Ilya Maximets wrote:
> >> On 9/8/23 14:15, Daniel P. Berrangé wrote:
> >>> On Fri, Sep 08, 2023 at 02:00:47PM +0200, Ilya Maximets wrote:
>  On 9/8/23 13:49, Daniel P. Berrangé wrote:
> > On Fri, Sep 08, 2023 at 01:34:54PM +0200, Ilya Maximets wrote:
> >> On 9/8/23 13:19, Stefan Hajnoczi wrote:
> >>> Hi Ilya and Jason,
> >>> There is a CI failure related to a missing Debian libxdp-dev package:
> >>> https://gitlab.com/qemu-project/qemu/-/jobs/5046139967
> >>>
> >>> I think the issue is that the debian-amd64 container image that QEMU
> >>> uses for testing is based on Debian 11 ("bullseye" aka "oldstable")
> >>> and libxdp is not available on that release:
> >>> https://packages.debian.org/search?keywords=libxdp&searchon=names&suite=oldstable§ion=all
> >>
> >> Hmm.  Sorry about that.
> >>
> >>>
> >>> If we need to support Debian 11 CI then either XDP could be disabled
> >>> for that distro or libxdp could be compiled from source.
> >>
> >> I'd suggest we just remove the attempt to install the package for now,
> >> building libxdp from sources may be a little painful to maintain.
> >>
> >> Can be re-added later once distributions with libxdp 1.4+ will be more
> >> widely available, i.e. when fedora dockerfile will be updated to 39,
> >> for example.  That should be soon-ish, right?
> >
> > If you follow the process in docs/devel/testing.rst for adding
> > libxdp in libvirt-ci, then lcitool will "do the right thing"
> > when we move the auto-generated dockerfiles to new distro versions.
> 
>  Thanks!  I'll prepare changes for libvirt-ci.
> 
>  In the meantime, none of the currently tested images will have a required
>  version of libxdp anyway, so I'm suggesting to just drop this one 
>  dockerfile
>  modification from the patch.  What do you think?
> >>>
> >>> Sure, if none of the distros have it, then lcitool won't emit the
> >>> dockerfile changes until we update the inherited distro version.
> >>> So it is sufficient to just update libvirt-ci.git with the mappings.yml
> >>> info for libxdp, and add 'libxdp' to the tests/lcitool/projects/qemu.yml
> >>> file in qemu.git. It will then 'just work' when someone updates the
> >>> distro versions later.
> >>
> >> I posted an MR for libvirt-ci adding libxdp:
> >>   https://gitlab.com/libvirt/libvirt-ci/-/merge_requests/429
> >>
> >> Please, take a look.
> >>
> >> The docs say that CI will try to build containers with the MR changes,
> >> but I don't think anything except sanity checks is actually tested on MR.
> >> Sorry if I missed something, never used GitLab pipelines before.
> > 
> > No, that's our fault - we've broken the CI and your change alerted
> > me to that fact :-)
> > 
> >> Note that with this update we will be installing older version of libxdp
> >> in many containers, even though they will not be used by QEMU, unless
> >> they are newer than 1.4.0.
> > 
> > No problem, as it means QEMU CI will demonstrate the the meson.build
> > change is ignoring the outdatd libxdp.
> > 
> >> tests/lcitool/projects/qemu.yml in qemu.git cannot be updated without
> >> updating a submodule after the MR merge.
> > 
> > Yep.
> 
> Since all the required changes went into libvirt-ci project, I posted an
> updated patch set named:
> 
>   '[PATCH v4 0/2] net: add initial support for AF_XDP network backend'
> 
> Please, take a look.
> 
> This should fix the CI issues, though I'm not sure how to run QEMU gitlab
> pipelines myself, so I didn't actually test all the images.

  git push gitlab  -o ci.variable=QEMU_CI=2

will create pipeline and immediately run all jobs.

Replace 'gitlab' with the name of the git remote pointing to your
gitlab fork of QEMU.

Using QEMU_CI=1 will create pipeline, but let you manually start
individual jobs from the web UI.

For further details see docs/devel/ci-jobs.rst.inc


> 
> Sent as a patch set because the libvirt-ci submodule bump brings in a few
> unrelated changes.  So, I split that into a separate patch.

Yep, that's perfect thanks.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 3/3] linux-user/syscall.c: do_ppoll: eliminate large alloca

2023-09-14 Thread Daniel P . Berrangé

On Thu, Sep 14, 2023 at 10:43:37AM +0300, Michael Tokarev wrote:
> do_ppoll() in linux-user/syscall.c uses alloca() to allocate
> an array of struct pullfds on the stack.  The only upper
> boundary for number of entries for this array is so that
> whole thing fits in INT_MAX.  This is definitely too much
> for stack allocation.
> 
> Use heap allocation when large number of entries is requested
> (currently 32, arbitrary), and continue to use alloca() for

Typo ? The code uses 64 rather than 32.

> smaller allocations, to optimize small operations for small
> sizes.  The code for this optimization is small, I see no
> reason for dropping it.
> 
> This eliminates last large user-controlled on-stack allocation
> from syscall.c.
> 
> Signed-off-by: Michael Tokarev 
> ---
>  linux-user/syscall.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index eabdf50abc..1dbe28eba4 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -1489,7 +1489,7 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
> abi_long arg3,
>  {
>  struct target_pollfd *target_pfd = NULL;
>  unsigned int nfds = arg2;
> -struct pollfd *pfd = NULL;
> +struct pollfd *pfd = NULL, *heap_pfd = NULL;

g_autofree struct pollfd *heap_pdf = NULL;

>  unsigned int i;
>  abi_long ret;
>  
> @@ -1503,7 +1503,17 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
> abi_long arg3,
>  return -TARGET_EFAULT;
>  }
>  
> -pfd = alloca(sizeof(struct pollfd) * nfds);
> +/* arbitrary "small" number to limit stack usage */
> +if (nfds <= 64) {
> +pfd = alloca(sizeof(struct pollfd) * nfds);
> +} else {
> +heap_pfd = g_try_new(struct pollfd, nfds);
> +if (!heap_pfd) {
> +ret = -TARGET_ENOMEM;
> +goto out;
> +}
> +pfd = heap_pfd;
> +}
>  for (i = 0; i < nfds; i++) {
>  pfd[i].fd = tswap32(target_pfd[i].fd);
>  pfd[i].events = tswap16(target_pfd[i].events);
> @@ -1567,6 +1577,7 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
> abi_long arg3,
>  }
>  
>  out:
> +g_free(heap_pfd);

This can be dropped with g_autofree usage

>  unlock_user(target_pfd, arg1, sizeof(struct target_pollfd) * nfds);
>  return ret;
>  }
> -- 
> 2.39.2
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3] hw/loongarch: Add virtio-mmio bus support

2023-09-14 Thread gaosong


在 2023/9/11 下午4:59, Tianrui Zhao 写道:

Add virtio-mmio bus support for LoongArch, so that devices
could be added in the virtio-mmio bus.

Signed-off-by: Tianrui Zhao 
Change-Id: Ib882005106562e0dfe74122a7fa2430fa081bfb2
---
  hw/loongarch/Kconfig   |  1 +
  hw/loongarch/acpi-build.c  | 25 +
  hw/loongarch/virt.c| 28 
  include/hw/pci-host/ls7a.h |  4 
  4 files changed, 58 insertions(+)


Drop Change-Id, and update virt.rst ('- 4 virtio-mmio transport devices')

Reviewed-by: Song Gao 


diff --git a/hw/loongarch/Kconfig b/hw/loongarch/Kconfig
index 1e7c5b43c5..01ab8ce8e7 100644
--- a/hw/loongarch/Kconfig
+++ b/hw/loongarch/Kconfig
@@ -22,3 +22,4 @@ config LOONGARCH_VIRT
  select DIMM
  select PFLASH_CFI01
  select ACPI_HMAT
+select VIRTIO_MMIO
diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index ae292fc543..d033fc2271 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -363,6 +363,30 @@ static void acpi_dsdt_add_tpm(Aml *scope, 
LoongArchMachineState *vms)
  }
  #endif
  
+static void acpi_dsdt_add_virtio(Aml *scope)

+{
+int i;
+hwaddr base = VIRT_VIRTIO_MMIO_BASE;
+hwaddr size = VIRT_VIRTIO_MMIO_SIZE;
+
+for (i = 0; i < VIRT_VIRTIO_MMIO_NUM; i++) {
+uint32_t irq = VIRT_VIRTIO_MMIO_IRQ + i;
+Aml *dev = aml_device("VR%02u", i);
+
+aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0005")));
+aml_append(dev, aml_name_decl("_UID", aml_int(i)));
+aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
+
+Aml *crs = aml_resource_template();
+aml_append(crs, aml_memory32_fixed(base, size, AML_READ_WRITE));
+aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+   AML_EXCLUSIVE, &irq, 1));
+aml_append(dev, aml_name_decl("_CRS", crs));
+aml_append(scope, dev);
+base += size;
+}
+}
+
  /* build DSDT */
  static void
  build_dsdt(GArray *table_data, BIOSLinker *linker, MachineState *machine)
@@ -381,6 +405,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
  #ifdef CONFIG_TPM
  acpi_dsdt_add_tpm(dsdt, lams);
  #endif
+acpi_dsdt_add_virtio(dsdt);
  /* System State Package */
  scope = aml_scope("\\");
  pkg = aml_package(4);
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 2629128aed..ffef3222da 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -116,6 +116,25 @@ static void fdt_add_rtc_node(LoongArchMachineState *lams)
  g_free(nodename);
  }
  
+static void fdt_add_virtio_mmio_node(LoongArchMachineState *lams)

+{
+int i;
+MachineState *ms = MACHINE(lams);
+
+for (i = VIRT_VIRTIO_MMIO_NUM - 1; i >= 0; i--) {
+char *nodename;
+hwaddr base = VIRT_VIRTIO_MMIO_BASE + i * VIRT_VIRTIO_MMIO_SIZE;
+
+nodename = g_strdup_printf("/virtio_mmio@%" PRIx64, base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_string(ms->fdt, nodename,
+"compatible", "virtio,mmio");
+qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg",
+ 2, base, 2, VIRT_VIRTIO_MMIO_SIZE);
+g_free(nodename);
+}
+}
+
  static void fdt_add_uart_node(LoongArchMachineState *lams)
  {
  char *nodename;
@@ -560,6 +579,15 @@ static void loongarch_devices_init(DeviceState *pch_pic, 
LoongArchMachineState *
   VIRT_RTC_IRQ - VIRT_GSI_BASE));
  fdt_add_rtc_node(lams);
  
+/* virtio-mmio device */

+for (i = 0; i < VIRT_VIRTIO_MMIO_NUM; i++) {
+hwaddr virtio_base = VIRT_VIRTIO_MMIO_BASE + i * VIRT_VIRTIO_MMIO_SIZE;
+int virtio_irq = VIRT_VIRTIO_MMIO_IRQ - VIRT_GSI_BASE + i;
+sysbus_create_simple("virtio-mmio", virtio_base,
+  qdev_get_gpio_in(pch_pic, virtio_irq));
+}
+fdt_add_virtio_mmio_node(lams);
+
  pm_mem = g_new(MemoryRegion, 1);
  memory_region_init_io(pm_mem, NULL, &loongarch_virt_pm_ops,
NULL, "loongarch_virt_pm", PM_SIZE);
diff --git a/include/hw/pci-host/ls7a.h b/include/hw/pci-host/ls7a.h
index e753449593..96506b9a4c 100644
--- a/include/hw/pci-host/ls7a.h
+++ b/include/hw/pci-host/ls7a.h
@@ -42,6 +42,10 @@
  #define VIRT_RTC_REG_BASE(VIRT_MISC_REG_BASE + 0x00050100)
  #define VIRT_RTC_LEN 0x100
  #define VIRT_SCI_IRQ (VIRT_GSI_BASE + 4)
+#define VIRT_VIRTIO_MMIO_IRQ (VIRT_GSI_BASE + 7)
+#define VIRT_VIRTIO_MMIO_BASE0x1e20
+#define VIRT_VIRTIO_MMIO_SIZE0x200
+#define VIRT_VIRTIO_MMIO_NUM 4



how about set num is 8 or larger?

virt machine

arm  : 32
openrisc : 8
riscv: 8




  #define VIRT_PLATFORM_BUS_BASEADDRESS   0x1600
  #define VIRT_PLATFORM_BUS_SIZE  0x200

Re: [PATCH v6 52/57] target/loongarch: Implement xvreplve xvinsve0 xvpickve

2023-09-14 Thread gaosong


在 2023/9/14 上午11:16, Richard Henderson 写道:

On 9/13/23 19:26, Song Gao wrote:

+static bool gen_xvrepl128(DisasContext *ctx, arg_vv_i *a, MemOp mop)
  {
-    int ofs;
-    TCGv_i64 desthigh, destlow, high, low;
+    int index = LSX_LEN / (8 * (1 << mop));
-    if (!avail_LSX(ctx)) {
-    return false;
-    }
-
-    if (!check_vec(ctx, 16)) {
+    if (!check_vec(ctx, 32)) {
  return true;
  }
-    desthigh = tcg_temp_new_i64();
-    destlow = tcg_temp_new_i64();
-    high = tcg_temp_new_i64();
-    low = tcg_temp_new_i64();
+    tcg_gen_gvec_dup_mem(mop, vec_reg_offset(a->vd, 0, mop),
+ vec_reg_offset(a->vj, a->imm, mop), 16, 16);
+    tcg_gen_gvec_dup_mem(mop, vec_reg_offset(a->vd, index, mop),
+ vec_reg_offset(a->vj, a->imm + index , mop), 
16, 16);


I think this isn't right, because vec_reg_offset(a->vd, 0, mop) is not 
the beginning of the vector for a big-endian host -- remember the xor in 
vec_reg_offset.



You are right.


Better as

     for (i = 0; i < 32; i += 16) {
     tcg_gen_gvec_dup_mem(mop, vec_full_offset(a->vd) + i,
  vec_reg_offset(a->vj, a->imm, mop) + i, 
16, 16);

     }


Got it.

Thanks.
Song Gao

Re: [PATCH v6 51/57] target/loongarch: Implement xvinsgr2vr xvpickve2gr

2023-09-14 Thread gaosong


在 2023/9/14 上午11:02, Richard Henderson 写道:

On 9/13/23 19:26, Song Gao wrote:

+static inline int vec_reg_offset(int regno, int index, MemOp mop)
+{
+    const uint8_t size = 1 << mop;
+    int offs = index * size;
+
+#if HOST_BIG_ENDIAN
+    if (size < 8 ) {
+    offs ^ = (8 - size);
+    }
+#endif
+    return offs + vec_full_offset(regno);
+}


Merge the #if into the if:

    if (HOST_BIG_ENDIAN && size < 8)


Got it.

Thanks.
Song Gao

Re: [PATCH v3 3/3] linux-user/syscall.c: do_ppoll: eliminate large alloca

2023-09-14 Thread Michael Tokarev


14.09.2023 11:18, Daniel P. Berrangé wrote:

On Thu, Sep 14, 2023 at 10:43:37AM +0300, Michael Tokarev wrote:

do_ppoll() in linux-user/syscall.c uses alloca() to allocate
an array of struct pullfds on the stack.  The only upper
boundary for number of entries for this array is so that
whole thing fits in INT_MAX.  This is definitely too much
for stack allocation.

Use heap allocation when large number of entries is requested
(currently 32, arbitrary), and continue to use alloca() for


Typo ? The code uses 64 rather than 32.


Yeah, it's a typo, after a few iterations trying to split this
all into pieces and editing in the process.



-struct pollfd *pfd = NULL;
+struct pollfd *pfd = NULL, *heap_pfd = NULL;


g_autofree struct pollfd *heap_pdf = NULL;


...
  
  out:

+g_free(heap_pfd);


This can be dropped with g_autofree usage


Yes, I know this, - this was deliberate choice.
Personally I'm just too used to old-school explicit resource deallocations.
Here, there's a single place where everything gets freed, so there's little
reason to use fancy modern automatic deallocations. To my taste anyway.
Maybe some future modifications adding some future ppoll3.. :)

Sure thing I can drop that and change it to autofree.

Thanks,

/mjt

Re: [QEMU PATCH v4 10/13] virtio-gpu: Resource UUID

2023-09-14 Thread Albert Esteve

On Thu, Sep 14, 2023 at 9:17 AM Akihiko Odaki 
wrote:

> On 2023/09/13 23:18, Albert Esteve wrote:
> >
> >
> > On Wed, Sep 13, 2023 at 3:43 PM Akihiko Odaki  > > wrote:
> >
> > On 2023/09/13 21:58, Albert Esteve wrote:
> >  >
> >  >
> >  > On Wed, Sep 13, 2023 at 2:22 PM Akihiko Odaki
> > mailto:akihiko.od...@daynix.com>
> >  >  > >> wrote:
> >  >
> >  > On 2023/09/13 20:34, Albert Esteve wrote:
> >  >  >
> >  >  >
> >  >  > On Wed, Sep 13, 2023 at 12:34 PM Akihiko Odaki
> >  > mailto:akihiko.od...@daynix.com>
> > >
> >  >  >  > 
> >  >  >  >  >  >
> >  >  > On 2023/09/13 16:55, Albert Esteve wrote:
> >  >  >  > Hi Antonio,
> >  >  >  >
> >  >  >  > If I'm not mistaken, this patch is related with:
> >  >  >  >
> >  >  >
> >  >
> > https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html
> >  >
> >  >
> >   <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>>
> >  >  >
> >  >
> >   <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>>>
> >  >  >  >
> >  >  >
> >  >
> >   <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html>>
> >  >  >
> >  >
> >   <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html> <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html <
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html
> >  >  >  > IMHO, ideally, virtio-gpu and vhost-user-gpu both,
> > would
> >  > use the
> >  >  >  > infrastructure from the patch I linked to store the
> >  >  >  > virtio objects, so that they can be later shared
> with
> >  > other devices.
> >  >  >
> >  >  > I don't think such sharing is possible because the
> > resources are
> >  >  > identified by IDs that are local to the device. That
> also
> >  > complicates
> >  >  > migration.
> >  >  >
> >  >  > Regards,
> >  >  > Akihiko Odaki
> >  >  >
> >  >  > Hi Akihiko,
> >  >  >
> >  >  > As far as I understand, the feature to export
> > dma-bufs from the
> >  >  > virtgpu was introduced as part of the virtio cross-device
> > sharing
> >  >  > proposal [1]. Thus, it shall be posible. When
> > virtgpu ASSING_UUID,
> >  >  > it exports and identifies the dmabuf resource, so that
> > when the
> >  > dmabuf gets
> >  >  > shared inside the guest (e.g., with virtio-video), we can
> > use the
> >  > assigned
> >  >  > UUID to find the dmabuf in the host (using the patch that I
> >  > linked above),
> >  >  > and import it.
> >  >  >
> >  >  > [1] - https://lwn.net/Articles/828988/
> > 
> >  >  > >
> > 
> >  >  > >>
> >  >
> >  > The problem is that virtio-gpu can have other kind of
> > resources like
> >  > pixman and OpenGL textures and manage them and DMA-BUFs with
> > unified
> >  > resource ID.
> >  >
> >  >
> >  > I see.
> >  >
> >  >
> >  > So you cannot change:
> >  > g_hash_table_insert(g->resource_uuids,
> >  > GUINT_TO_POINTER(assign.resource_id), uuid);
> >  > by:
> >  > virtio_add_dmabuf(uuid, assign.resource_id);
> >  >
> >  > assign.resource_id is not DMA-BUF file descriptor, and the
> > underlying
> >  > resource my not be DMA-BUF at first place.
> >  >
>

Re: [PATCH] mem/x86: add processor address space check for VM memory

2023-09-14 Thread David Hildenbrand


On 14.09.23 07:53, Ani Sinha wrote:




On 12-Sep-2023, at 9:04 PM, David Hildenbrand  wrote:

[...]


diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 54838c0c41..d187890675 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -908,9 +908,12 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, 
uint64_t pci_hole64_size)
{
 X86CPU *cpu = X86_CPU(first_cpu);

-/* 32-bit systems don't have hole64 thus return max CPU address */
-if (cpu->phys_bits <= 32) {
-return ((hwaddr)1 << cpu->phys_bits) - 1;
+/*
+ * 32-bit systems don't have hole64, but we might have a region for
+ * memory hotplug.
+ */
+if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) {
+return pc_pci_hole64_start() - 1;

Ok this is very confusing! I am looking at pc_pci_hole64_start() function. I 
have a few questions …
(a) pc_get_device_memory_range() returns the size of the device memory as the 
difference between ram_size and maxram_size. But from what I understand, 
ram_size is the actual size of the ram present and maxram_size is the max size 
of ram *after* hot plugging additional memory. How can we assume that the 
additional available space is already occupied by hot plugged memory?


Let's take a look at an example:

$ ./build/qemu-system-x86_64 -m 8g,maxmem=16g,slots=1 \
  -object memory-backend-ram,id=mem0,size=1g \
  -device pc-dimm,memdev=mem0 \
  -nodefaults -nographic -S -monitor stdio

(qemu) info mtree
...
memory-region: system
  - (prio 0, i/o): system
-bfff (prio 0, ram): alias ram-below-4g @pc.ram 
-bfff
- (prio -1, i/o): pci
  000c-000d (prio 1, rom): pc.rom
  000e-000f (prio 1, rom): alias isa-bios @pc.bios 
0002-0003
  fffc- (prio 0, rom): pc.bios
000a-000b (prio 1, i/o): alias smram-region @pci 
000a-000b
000c-000c3fff (prio 1, i/o): alias pam-pci @pci 
000c-000c3fff
000c4000-000c7fff (prio 1, i/o): alias pam-pci @pci 
000c4000-000c7fff
000c8000-000cbfff (prio 1, i/o): alias pam-pci @pci 
000c8000-000cbfff
000cc000-000c (prio 1, i/o): alias pam-pci @pci 
000cc000-000c
000d-000d3fff (prio 1, i/o): alias pam-pci @pci 
000d-000d3fff
000d4000-000d7fff (prio 1, i/o): alias pam-pci @pci 
000d4000-000d7fff
000d8000-000dbfff (prio 1, i/o): alias pam-pci @pci 
000d8000-000dbfff
000dc000-000d (prio 1, i/o): alias pam-pci @pci 
000dc000-000d
000e-000e3fff (prio 1, i/o): alias pam-pci @pci 
000e-000e3fff
000e4000-000e7fff (prio 1, i/o): alias pam-pci @pci 
000e4000-000e7fff
000e8000-000ebfff (prio 1, i/o): alias pam-pci @pci 
000e8000-000ebfff
000ec000-000e (prio 1, i/o): alias pam-pci @pci 
000ec000-000e
000f-000f (prio 1, i/o): alias pam-pci @pci 
000f-000f
fec0-fec00fff (prio 0, i/o): ioapic
fed0-fed003ff (prio 0, i/o): hpet
fee0-feef (prio 4096, i/o): apic-msi
0001-00023fff (prio 0, ram): alias ram-above-4g @pc.ram 
c000-0001
00024000-00047fff (prio 0, i/o): device-memory
  00024000-00027fff (prio 0, ram): mem0


We requested 8G of boot memory, which is split between "<4G" memory and ">=4G" 
memory.

We only place exactly 3G (0x0->0xbfff) under 4G, starting at address 0.


I can’t reconcile this with this code for q35:

if (machine->ram_size >= 0xb000) {
 lowmem = 0x8000; // max memory 0x8fff or 2.25 GiB
 } else {
 lowmem = 0xb000; // max memory 0xbfff or 3 GiB
 }

You assigned 8 Gib to ram which is > 0xb000 (2.75 Gib)



QEMU defaults to the "pc" machine. If you add "-M q35" you get:

address-space: memory
  - (prio 0, i/o): system
-7fff (prio 0, ram): alias ram-below-4g @pc.ram 
-7fff
[...]
0001-00027fff (prio 0, ram): alias ram-above-4g @pc.ram 
8000-0001
00028000-0004bfff (prio 0, i/o): device-memory
  00028000-0002bfff (prio 0, ram): mem0






We leave the remainder (1G) of the <4G addresses available for I/O devices 
(32bit PCI hole).

So we end up with 5G (0x1->

Re: [PATCH v3] hw/loongarch: Add virtio-mmio bus support

2023-09-14 Thread gaosong


在 2023/9/11 下午4:59, Tianrui Zhao 写道:

+static void fdt_add_virtio_mmio_node(LoongArchMachineState *lams)
+{
+int i;
+MachineState *ms = MACHINE(lams);
+
+for (i = VIRT_VIRTIO_MMIO_NUM - 1; i >= 0; i--) {
+char *nodename;
+hwaddr base = VIRT_VIRTIO_MMIO_BASE + i * VIRT_VIRTIO_MMIO_SIZE;
+
+nodename = g_strdup_printf("/virtio_mmio@%" PRIx64, base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_string(ms->fdt, nodename,
+"compatible", "virtio,mmio");
+qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg",
+ 2, base, 2, VIRT_VIRTIO_MMIO_SIZE);

Missing node interrupts.

Thanks.
Song Gao


+g_free(nodename);
+}
+}

[PATCH] target/mips: Fix MSA BZ/BNZ opcodes displacement

2023-09-14 Thread Philippe Mathieu-Daudé

The PC offset is *signed*.

Cc: qemu-sta...@nongnu.org
Reported-by: Sergey Evlashev 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1624
Fixes: c7a9ef7517 ("target/mips: Introduce decode tree bindings for MSA ASE")
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/tcg/msa.decode | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/mips/tcg/msa.decode b/target/mips/tcg/msa.decode
index 9575289195..4410e2a02e 100644
--- a/target/mips/tcg/msa.decode
+++ b/target/mips/tcg/msa.decode
@@ -31,8 +31,8 @@
 
 @lsa.. rs:5 rt:5 rd:5 ... sa:2 ..   &r
 @ldst   .. sa:s10 ws:5 wd:5  df:2   &msa_i
-@bz_v   .. ... ..wt:5 sa:16 &msa_bz df=3
-@bz .. ...  df:2 wt:5 sa:16 &msa_bz
+@bz_v   .. ... ..wt:5 sa:s16&msa_bz df=3
+@bz .. ...  df:2 wt:5 sa:s16&msa_bz
 @elm_df ..  ..ws:5 wd:5 ..  &msa_elm_df 
df=%elm_df n=%elm_n
 @elm.. .. ws:5 wd:5 ..  &msa_elm
 @vec.. . wt:5 ws:5 wd:5 ..  &msa_r df=0
-- 
2.41.0

Re: [PATCH v1 00/22] vfio: Adopt iommufd

2023-09-14 Thread Eric Auger

Hi Zhenzhong

On 8/30/23 12:37, Zhenzhong Duan wrote:
> Hi All,
>
> As the kernel side iommufd cdev and hot reset feature have been queued,
> also hwpt alloc has been added in Jason's for_next branch [1], I'd like
> to update a new version matching kernel side update and with rfc flag
> removed. Qemu code can be found at [2], look forward more comments!
>
>
> We have done wide test with different combinations, e.g:
>
> - PCI device were tested
> - FD passing and hot reset with some trick.
> - device hotplug test with legacy and iommufd backends
> - with or without vIOMMU for legacy and iommufd backends
> - divices linked to different iommufds
> - VFIO migration with a E800 net card(no dirty sync support) passthrough
> - platform, ccw and ap were only compile-tested due to environment limit
>
>
> Given some iommufd kernel limitations, the iommufd backend is
> not yet fully on par with the legacy backend w.r.t. features like:
> - p2p mappings (you will see related error traces)
> - dirty page sync
> - and etc.
>
>
> Changelog:
> v1:
> - Alloc hwpt instead of using auto hwpt
> - elaborate iommufd code per Nicolin
> - consolidate two patches and drop as.c
> - typo error fix and function rename
>
> I didn't list change log of rfc stage, see [3] if anyone is interested.
>
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git
> [2] https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v1
> [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02529.html

Do you have a branch to share?

It does not apply to upstream

Thanks

Eric
>
>
> --
>
> With the introduction of iommufd, the Linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>
> This QEMU integration is the result of a collaborative work between
> Yi Liu, Yi Sun, Nicolin Chen and Eric Auger.
>
> At QEMU level, interactions with the /dev/iommu are abstracted by a new
> iommufd object (compiled in with the CONFIG_IOMMUFD option).
>
> Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
> linked with an iommufd object. In this series, the vfio-pci device is
> granted with such capability (other VFIO devices are not yet ready):
>
> It gets a new optional parameter named iommufd which allows to pass
> an iommufd object:
>
> -object iommufd,id=iommufd0
> -device vfio-pci,host=:02:00.0,iommufd=iommufd0
>
> Note the /dev/iommu and vfio cdev can be externally opened by a
> management layer. In such a case the fd is passed:
>   
> -object iommufd,id=iommufd0,fd=22
> -device vfio-pci,iommufd=iommufd0,fd=23
>
> If the fd parameter is not passed, the fd is opened by QEMU.
> See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
> for detailed discuss on this requirement.
>
> If no iommufd option is passed to the vfio-pci device, iommufd is not
> used and the end-user gets the behavior based on the legacy vfio iommu
> interfaces:
>
> -device vfio-pci,host=:02:00.0
>
> While the legacy kernel interface is group-centric, the new iommufd
> interface is device-centric, relying on device fd and iommufd.
>
> To support both interfaces in the QEMU VFIO device we reworked the vfio
> container abstraction so that the generic VFIO code can use either
> backend.
>
> The VFIOContainer object becomes a base object derived into
> a) the legacy VFIO container and
> b) the new iommufd based container.
>
> The base object implements generic code such as code related to
> memory_listener and address space management whereas the derived
> objects implement callbacks specific to either BE, legacy and
> iommufd. Indeed each backend has its own way to setup secure context
> and dma management interface. The below diagram shows how it looks
> like with both BEs.
>
> VFIO   AddressSpace/Memory
> +---+  +--+  +-+  +-+
> |  pci  |  | platform |  |  ap |  | ccw |
> +---+---+  ++-+  +--+--+  +--+--+ +--+
> |   |   |||   AddressSpace   |
> |   |   ||++-+
> +---V---V---VV+   /
> |   VFIOAddressSpace  | <+
> |  |  |  MemoryListener
> |  VFIOContainer list |
> +---+++
> ||
> ||
> +---V--++V--+
> |   iommufd||vfio legacy|
> |  container   || container |
> +---+--+

[PATCH] target/mips: Fix TX79 LQ/SQ opcodes

2023-09-14 Thread Philippe Mathieu-Daudé

The base register address offset is *signed*.

Cc: qemu-sta...@nongnu.org
Fixes: 82a9f9 ("target/mips/tx79: Introduce LQ opcode (Load Quadword)")
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/tcg/tx79.decode | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/mips/tcg/tx79.decode b/target/mips/tcg/tx79.decode
index 57d87a2076..578b8c54c0 100644
--- a/target/mips/tcg/tx79.decode
+++ b/target/mips/tcg/tx79.decode
@@ -24,7 +24,7 @@
 @rs .. rs:5  . ..  ..   &r sa=0  rt=0 rd=0
 @rd .. ..  rd:5  . ..   &r sa=0 rs=0 rt=0
 
-@ldst.. base:5 rt:5 offset:16   &i
+@ldst.. base:5 rt:5 offset:s16  &i
 
 ###
 
-- 
2.41.0

RE: [PATCH v1 00/22] vfio: Adopt iommufd

2023-09-14 Thread Duan, Zhenzhong

Hi Eric,

>-Original Message-
>From: Eric Auger 
>Sent: Thursday, September 14, 2023 5:04 PM
>To: Duan, Zhenzhong ; qemu-devel@nongnu.org
>Cc: alex.william...@redhat.com; c...@redhat.com; j...@nvidia.com;
>nicol...@nvidia.com; Martins, Joao ;
>pet...@redhat.com; jasow...@redhat.com; Tian, Kevin ;
>Liu, Yi L ; Sun, Yi Y ; Peng, Chao P
>
>Subject: Re: [PATCH v1 00/22] vfio: Adopt iommufd
>
>Hi Zhenzhong
>
>On 8/30/23 12:37, Zhenzhong Duan wrote:
>> Hi All,
>>
>> As the kernel side iommufd cdev and hot reset feature have been queued,
>> also hwpt alloc has been added in Jason's for_next branch [1], I'd like
>> to update a new version matching kernel side update and with rfc flag
>> removed. Qemu code can be found at [2], look forward more comments!
>>
>>
>> We have done wide test with different combinations, e.g:
>>
>> - PCI device were tested
>> - FD passing and hot reset with some trick.
>> - device hotplug test with legacy and iommufd backends
>> - with or without vIOMMU for legacy and iommufd backends
>> - divices linked to different iommufds
>> - VFIO migration with a E800 net card(no dirty sync support) passthrough
>> - platform, ccw and ap were only compile-tested due to environment limit
>>
>>
>> Given some iommufd kernel limitations, the iommufd backend is
>> not yet fully on par with the legacy backend w.r.t. features like:
>> - p2p mappings (you will see related error traces)
>> - dirty page sync
>> - and etc.
>>
>>
>> Changelog:
>> v1:
>> - Alloc hwpt instead of using auto hwpt
>> - elaborate iommufd code per Nicolin
>> - consolidate two patches and drop as.c
>> - typo error fix and function rename
>>
>> I didn't list change log of rfc stage, see [3] if anyone is interested.
>>
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git
>> [2] https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v1
>> [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02529.html
>
>Do you have a branch to share?
>
>It does not apply to upstream

Sure, https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_cdev_v1_rebased
I think this one is already based on today's upstream.

Thanks
Zhenzhong

[PATCH] hw/qxl: move check of slot_id before accessing guest_slots

2023-09-14 Thread Anastasia Belova

If slot_id >= NUM_MEMSLOTS, buffer overflow is possible.
So the check should be upper than d->guest_slots[slot_id]
where size of d->guest_slots is NUM_MEMSLOTS.

Fixes: e954ea2873 ("qxl: qxl_add_memslot: remove guest trigerrable panics")
Signed-off-by: Anastasia Belova 
---
 hw/display/qxl.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 7bb00d68f5..dc618727c0 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -1309,16 +1309,17 @@ static int qxl_add_memslot(PCIQXLDevice *d, uint32_t 
slot_id, uint64_t delta,
 QXLDevMemSlot memslot;
 int i;
 
-guest_start = le64_to_cpu(d->guest_slots[slot_id].slot.mem_start);
-guest_end   = le64_to_cpu(d->guest_slots[slot_id].slot.mem_end);
-
-trace_qxl_memslot_add_guest(d->id, slot_id, guest_start, guest_end);
-
 if (slot_id >= NUM_MEMSLOTS) {
 qxl_set_guest_bug(d, "%s: slot_id >= NUM_MEMSLOTS %d >= %d", __func__,
   slot_id, NUM_MEMSLOTS);
 return 1;
 }
+
+guest_start = le64_to_cpu(d->guest_slots[slot_id].slot.mem_start);
+guest_end   = le64_to_cpu(d->guest_slots[slot_id].slot.mem_end);
+
+trace_qxl_memslot_add_guest(d->id, slot_id, guest_start, guest_end);
+
 if (guest_start > guest_end) {
 qxl_set_guest_bug(d, "%s: guest_start > guest_end 0x%" PRIx64
  " > 0x%" PRIx64, __func__, guest_start, guest_end);
-- 
2.30.2

[PATCH v6 0/3] hw/{i2c,nvme}: mctp endpoint, nvme management interface model

2023-09-14 Thread Klaus Jensen

This adds a generic MCTP endpoint model that other devices may derive
from.

Also included is a very basic implementation of an NVMe-MI device,
supporting only a small subset of the required commands.

Since this all relies on i2c target mode, this can currently only be
used with an SoC that includes the Aspeed I2C controller.

The easiest way to get up and running with this, is to grab my buildroot
overlay[1] (aspeed_ast2600evb_nmi_defconfig). It includes modified a
modified dts as well as a couple of required packages.

QEMU can then be launched along these lines:

  qemu-system-arm \
-nographic \
-M ast2600-evb \
-kernel output/images/zImage \
-initrd output/images/rootfs.cpio \
-dtb output/images/aspeed-ast2600-evb-nmi.dtb \
-nic user,hostfwd=tcp::-:22 \
-device nmi-i2c,address=0x3a \
-serial mon:stdio

>From within the booted system,

  mctp addr add 8 dev mctpi2c15
  mctp link set mctpi2c15 up
  mctp route add 9 via mctpi2c15
  mctp neigh add 9 dev mctpi2c15 lladdr 0x3a
  mi-mctp 1 9 info

Comments are very welcome!

  [1]: https://github.com/birkelund/hwtests/tree/main/br2-external

Signed-off-by: Klaus Jensen 
---
Changes in v6:
- Use nmi_scratch_append() directly where it makes sense. Fixes bug
  observed by Andrew.
- Link to v5: 
https://lore.kernel.org/r/20230905-nmi-i2c-v5-0-0001d372a...@samsung.com

Changes in v5:
- Added a nmi_scratch_append() that asserts available space in the
  scratch buffer. This is a similar defensive strategy as used in
  hw/i2c/mctp.c
- Various small fixups in response to review (Jonathan)
- Link to v4: 
https://lore.kernel.org/r/20230823-nmi-i2c-v4-0-2b0f86e5b...@samsung.com

---
Klaus Jensen (3):
  hw/i2c: add smbus pec utility function
  hw/i2c: add mctp core
  hw/nvme: add nvme management interface model

 MAINTAINERS   |   7 +
 hw/arm/Kconfig|   1 +
 hw/i2c/Kconfig|   4 +
 hw/i2c/mctp.c | 432 ++
 hw/i2c/meson.build|   1 +
 hw/i2c/smbus_master.c |  26 +++
 hw/i2c/trace-events   |  13 ++
 hw/nvme/Kconfig   |   4 +
 hw/nvme/meson.build   |   1 +
 hw/nvme/nmi-i2c.c | 407 +++
 hw/nvme/trace-events  |   6 +
 include/hw/i2c/mctp.h | 125 
 include/hw/i2c/smbus_master.h |   2 +
 include/net/mctp.h|  35 
 14 files changed, 1064 insertions(+)
---
base-commit: 005ad32358f12fe9313a4a01918a55e60d4f39e5
change-id: 20230822-nmi-i2c-d804ed5be7e6

Best regards,
-- 
Klaus Jensen

Re: [PATCH v9 00/12] Add VIRTIO sound card

2023-09-14 Thread Stefano Garzarella


On Wed, Sep 13, 2023 at 10:33:07AM +0300, Emmanouil Pitsidianakis wrote:

This patch series adds an audio device implementing the recent virtio
sound spec (1.2) and a corresponding PCI wrapper device.

v9 can be found online at:

https://gitlab.com/epilys/qemu/-/tree/virtio-snd-v9

Ref 06e6b17186

Main differences with v8 patch series [^v8]
:

- Addressed [^v8] review comments.
- Add cpu_to_le32(_) and le32_to_cpu(_) conversions for messages from/to
 the guest according to the virtio spec.
- Inlined some functions and types to reduce review complexity.
- Corrected the replies to IO messages; now both Playback and Capture
 work correctly for me. (If you hear cracks in pulseaudio+guest, try
 pipewire+guest).


We are seeing something strange with the virtio-sound Linux driver.
It seems that the driver modifies the buffers after exposing them to
the device via the avail ring.

It seems we have this strange behaviour with this built-in QEMU device,
but also with the vhost-device-sound, so it could be some spec
violation in the Linux driver.

Matias also reported on the v8 of this series:
https://lore.kernel.org/qemu-devel/ZPg60lzXWxHPQJEa@fedora/

Can you check if you have the same behaviour?

Nothing that blocks this series of course, but just to confirm that
there may be something to fix in the Linux driver.

Thanks,
Stefano

[PATCH v6 3/3] hw/nvme: add nvme management interface model

2023-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add the 'nmi-i2c' device that emulates an NVMe Management Interface
controller.

Initial support is very basic (Read NMI DS, Configuration Get).

This is based on previously posted code by Padmakar Kalghatgi, Arun
Kumar Agasar and Saurav Kumar.

Reviewed-by: Jonathan Cameron 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/Kconfig  |   4 +
 hw/nvme/meson.build  |   1 +
 hw/nvme/nmi-i2c.c| 407 +++
 hw/nvme/trace-events |   6 +
 4 files changed, 418 insertions(+)

diff --git a/hw/nvme/Kconfig b/hw/nvme/Kconfig
index cfa2ab0f9d5a..e1f6360c0f4b 100644
--- a/hw/nvme/Kconfig
+++ b/hw/nvme/Kconfig
@@ -2,3 +2,7 @@ config NVME_PCI
 bool
 default y if PCI_DEVICES || PCIE_DEVICES
 depends on PCI
+
+config NVME_NMI_I2C
+bool
+default y if I2C_MCTP
diff --git a/hw/nvme/meson.build b/hw/nvme/meson.build
index 1a6a2ca2f307..7bc85f31c149 100644
--- a/hw/nvme/meson.build
+++ b/hw/nvme/meson.build
@@ -1 +1,2 @@
 system_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', 
'ns.c', 'subsys.c'))
+system_ss.add(when: 'CONFIG_NVME_NMI_I2C', if_true: files('nmi-i2c.c'))
diff --git a/hw/nvme/nmi-i2c.c b/hw/nvme/nmi-i2c.c
new file mode 100644
index ..bf4648db0457
--- /dev/null
+++ b/hw/nvme/nmi-i2c.c
@@ -0,0 +1,407 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2023 Samsung Electronics Co., Ltd.
+ *
+ * SPDX-FileContributor: Padmakar Kalghatgi 
+ * SPDX-FileContributor: Arun Kumar Agasar 
+ * SPDX-FileContributor: Saurav Kumar 
+ * SPDX-FileContributor: Klaus Jensen 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/crc32c.h"
+#include "hw/registerfields.h"
+#include "hw/i2c/i2c.h"
+#include "hw/i2c/mctp.h"
+#include "net/mctp.h"
+#include "trace.h"
+
+/* NVM Express Management Interface 1.2c, Section 3.1 */
+#define NMI_MAX_MESSAGE_LENGTH 4224
+
+#define TYPE_NMI_I2C_DEVICE "nmi-i2c"
+OBJECT_DECLARE_SIMPLE_TYPE(NMIDevice, NMI_I2C_DEVICE)
+
+typedef struct NMIDevice {
+MCTPI2CEndpoint mctp;
+
+uint8_t buffer[NMI_MAX_MESSAGE_LENGTH];
+uint8_t scratch[NMI_MAX_MESSAGE_LENGTH];
+
+size_t  len;
+int64_t pos;
+} NMIDevice;
+
+FIELD(NMI_MCTPD, MT, 0, 7)
+FIELD(NMI_MCTPD, IC, 7, 1)
+
+#define NMI_MCTPD_MT_NMI 0x4
+#define NMI_MCTPD_IC_ENABLED 0x1
+
+FIELD(NMI_NMP, ROR, 7, 1)
+FIELD(NMI_NMP, NMIMT, 3, 4)
+
+#define NMI_NMP_NMIMT_NVME_MI 0x1
+#define NMI_NMP_NMIMT_NVME_ADMIN 0x2
+
+typedef struct NMIMessage {
+uint8_t mctpd;
+uint8_t nmp;
+uint8_t rsvd2[2];
+uint8_t payload[]; /* includes the Message Integrity Check */
+} NMIMessage;
+
+typedef struct NMIRequest {
+   uint8_t opc;
+   uint8_t rsvd1[3];
+   uint32_t dw0;
+   uint32_t dw1;
+   uint32_t mic;
+} NMIRequest;
+
+FIELD(NMI_CMD_READ_NMI_DS_DW0, DTYP, 24, 8)
+
+typedef enum NMIReadDSType {
+NMI_CMD_READ_NMI_DS_SUBSYSTEM   = 0x0,
+NMI_CMD_READ_NMI_DS_PORTS   = 0x1,
+NMI_CMD_READ_NMI_DS_CTRL_LIST   = 0x2,
+NMI_CMD_READ_NMI_DS_CTRL_INFO   = 0x3,
+NMI_CMD_READ_NMI_DS_OPT_CMD_SUPPORT = 0x4,
+NMI_CMD_READ_NMI_DS_MEB_CMD_SUPPORT = 0x5,
+} NMIReadDSType;
+
+#define NMI_STATUS_INVALID_PARAMETER 0x4
+
+static void nmi_scratch_append(NMIDevice *nmi, const void *buf, size_t count)
+{
+assert(nmi->pos + count <= NMI_MAX_MESSAGE_LENGTH);
+
+memcpy(nmi->scratch + nmi->pos, buf, count);
+nmi->pos += count;
+}
+
+static void nmi_set_parameter_error(NMIDevice *nmi, uint8_t bit, uint16_t byte)
+{
+/* NVM Express Management Interface 1.2c, Figure 30 */
+struct resp {
+uint8_t  status;
+uint8_t  bit;
+uint16_t byte;
+};
+
+struct resp buf = {
+.status = NMI_STATUS_INVALID_PARAMETER,
+.bit = bit & 0x3,
+.byte = byte,
+};
+
+nmi_scratch_append(nmi, &buf, sizeof(buf));
+}
+
+static void nmi_set_error(NMIDevice *nmi, uint8_t status)
+{
+const uint8_t buf[4] = {status,};
+
+nmi_scratch_append(nmi, buf, sizeof(buf));
+}
+
+static void nmi_handle_mi_read_nmi_ds(NMIDevice *nmi, NMIRequest *request)
+{
+I2CSlave *i2c = I2C_SLAVE(nmi);
+
+uint32_t dw0 = le32_to_cpu(request->dw0);
+uint8_t dtyp = FIELD_EX32(dw0, NMI_CMD_READ_NMI_DS_DW0, DTYP);
+
+trace_nmi_handle_mi_read_nmi_ds(dtyp);
+
+static const uint8_t nmi_ds_subsystem[36] = {
+0x00,   /* success */
+0x20, 0x00, /* response data length */
+0x00,   /* reserved */
+0x00,   /* number of ports */
+0x01,   /* major version */
+0x01,   /* minor version */
+};
+
+/*
+ * Cannot be static (or const) since we need to patch in the i2c
+ * address.
+ */
+const uint8_t nmi_ds_ports[36] = {
+0x00,   /* success */
+0x20, 0x00, /* response data length */
+0x00,   /* reserved */
+0x02,   /* port type (smbus) */
+0x00,   /* reserved */
+0x40, 0x00, /* maximum mctp transiss

[PATCH v6 1/3] hw/i2c: add smbus pec utility function

2023-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add i2c_smbus_pec() to calculate the SMBus Packet Error Code for a
message.

Reviewed-by: Jonathan Cameron 
Signed-off-by: Klaus Jensen 
---
 hw/i2c/smbus_master.c | 26 ++
 include/hw/i2c/smbus_master.h |  2 ++
 2 files changed, 28 insertions(+)

diff --git a/hw/i2c/smbus_master.c b/hw/i2c/smbus_master.c
index 6a53c34e70b7..01a8e4700222 100644
--- a/hw/i2c/smbus_master.c
+++ b/hw/i2c/smbus_master.c
@@ -15,6 +15,32 @@
 #include "hw/i2c/i2c.h"
 #include "hw/i2c/smbus_master.h"
 
+static uint8_t crc8(uint16_t data)
+{
+int i;
+
+for (i = 0; i < 8; i++) {
+if (data & 0x8000) {
+data ^= 0x1070U << 3;
+}
+
+data <<= 1;
+}
+
+return (uint8_t)(data >> 8);
+}
+
+uint8_t i2c_smbus_pec(uint8_t crc, uint8_t *buf, size_t len)
+{
+int i;
+
+for (i = 0; i < len; i++) {
+crc = crc8((crc ^ buf[i]) << 8);
+}
+
+return crc;
+}
+
 /* Master device commands.  */
 int smbus_quick_command(I2CBus *bus, uint8_t addr, int read)
 {
diff --git a/include/hw/i2c/smbus_master.h b/include/hw/i2c/smbus_master.h
index bb13bc423c22..d90f81767d86 100644
--- a/include/hw/i2c/smbus_master.h
+++ b/include/hw/i2c/smbus_master.h
@@ -27,6 +27,8 @@
 
 #include "hw/i2c/i2c.h"
 
+uint8_t i2c_smbus_pec(uint8_t crc, uint8_t *buf, size_t len);
+
 /* Master device commands.  */
 int smbus_quick_command(I2CBus *bus, uint8_t addr, int read);
 int smbus_receive_byte(I2CBus *bus, uint8_t addr);

-- 
2.42.0

[PATCH v6 2/3] hw/i2c: add mctp core

2023-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add an abstract MCTP over I2C endpoint model. This implements MCTP
control message handling as well as handling the actual I2C transport
(packetization).

Devices are intended to derive from this and implement the class
methods.

Parts of this implementation is inspired by code[1] previously posted by
Jonathan Cameron.

Squashed a fix[2] from Matt Johnston.

  [1]: 
https://lore.kernel.org/qemu-devel/20220520170128.4436-1-jonathan.came...@huawei.com/
  [2]: 
https://lore.kernel.org/qemu-devel/20221121080445.ga29...@codeconstruct.com.au/

Tested-by: Jonathan Cameron 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Klaus Jensen 
---
 MAINTAINERS   |   7 +
 hw/arm/Kconfig|   1 +
 hw/i2c/Kconfig|   4 +
 hw/i2c/mctp.c | 432 ++
 hw/i2c/meson.build|   1 +
 hw/i2c/trace-events   |  13 ++
 include/hw/i2c/mctp.h | 125 +++
 include/net/mctp.h|  35 
 8 files changed, 618 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 00562f924f7a..3208ebb1bcde 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3404,6 +3404,13 @@ F: tests/qtest/adm1272-test.c
 F: tests/qtest/max34451-test.c
 F: tests/qtest/isl_pmbus_vr-test.c
 
+MCTP I2C Transport
+M: Klaus Jensen 
+S: Maintained
+F: hw/i2c/mctp.c
+F: include/hw/i2c/mctp.h
+F: include/net/mctp.h
+
 Firmware schema specifications
 M: Philippe Mathieu-Daudé 
 R: Daniel P. Berrange 
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 7e6834844051..5bcb1e0e8a6f 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -541,6 +541,7 @@ config ASPEED_SOC
 select DS1338
 select FTGMAC100
 select I2C
+select I2C_MCTP
 select DPS310
 select PCA9552
 select SERIAL
diff --git a/hw/i2c/Kconfig b/hw/i2c/Kconfig
index 14886b35dac2..2b2a50b83d1e 100644
--- a/hw/i2c/Kconfig
+++ b/hw/i2c/Kconfig
@@ -6,6 +6,10 @@ config I2C_DEVICES
 # to any board's i2c bus
 bool
 
+config I2C_MCTP
+bool
+select I2C
+
 config SMBUS
 bool
 select I2C
diff --git a/hw/i2c/mctp.c b/hw/i2c/mctp.c
new file mode 100644
index ..8d8e74567745
--- /dev/null
+++ b/hw/i2c/mctp.c
@@ -0,0 +1,432 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * SPDX-FileCopyrightText: Copyright (c) 2023 Samsung Electronics Co., Ltd.
+ * SPDX-FileContributor: Klaus Jensen 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+
+#include "hw/qdev-properties.h"
+#include "hw/i2c/i2c.h"
+#include "hw/i2c/smbus_master.h"
+#include "hw/i2c/mctp.h"
+#include "net/mctp.h"
+
+#include "trace.h"
+
+/* DSP0237 1.2.0, Figure 1 */
+typedef struct MCTPI2CPacketHeader {
+uint8_t dest;
+#define MCTP_I2C_COMMAND_CODE 0xf
+uint8_t command_code;
+uint8_t byte_count;
+uint8_t source;
+} MCTPI2CPacketHeader;
+
+typedef struct MCTPI2CPacket {
+MCTPI2CPacketHeader i2c;
+MCTPPacket  mctp;
+} MCTPI2CPacket;
+
+#define i2c_mctp_payload_offset offsetof(MCTPI2CPacket, mctp.payload)
+#define i2c_mctp_payload(buf) (buf + i2c_mctp_payload_offset)
+
+/* DSP0236 1.3.0, Figure 20 */
+typedef struct MCTPControlMessage {
+#define MCTP_MESSAGE_TYPE_CONTROL 0x0
+uint8_t type;
+#define MCTP_CONTROL_FLAGS_RQ   (1 << 7)
+#define MCTP_CONTROL_FLAGS_D(1 << 6)
+uint8_t flags;
+uint8_t command_code;
+uint8_t data[];
+} MCTPControlMessage;
+
+enum MCTPControlCommandCodes {
+MCTP_CONTROL_SET_EID= 0x01,
+MCTP_CONTROL_GET_EID= 0x02,
+MCTP_CONTROL_GET_VERSION= 0x04,
+MCTP_CONTROL_GET_MESSAGE_TYPE_SUPPORT   = 0x05,
+};
+
+#define MCTP_CONTROL_ERROR_UNSUPPORTED_CMD 0x5
+
+#define i2c_mctp_control_data_offset \
+(i2c_mctp_payload_offset + offsetof(MCTPControlMessage, data))
+#define i2c_mctp_control_data(buf) (buf + i2c_mctp_control_data_offset)
+
+/**
+ * The byte count field in the SMBUS Block Write containers the number of bytes
+ * *following* the field itself.
+ *
+ * This is at least 5.
+ *
+ * 1 byte for the MCTP/I2C piggy-backed I2C source address in addition to the
+ * size of the MCTP transport/packet header.
+ */
+#define MCTP_I2C_BYTE_COUNT_OFFSET (sizeof(MCTPPacketHeader) + 1)
+
+void i2c_mctp_schedule_send(MCTPI2CEndpoint *mctp)
+{
+I2CBus *i2c = I2C_BUS(qdev_get_parent_bus(DEVICE(mctp)));
+
+mctp->tx.state = I2C_MCTP_STATE_TX_START_SEND;
+
+i2c_bus_master(i2c, mctp->tx.bh);
+}
+
+static void i2c_mctp_tx(void *opaque)
+{
+DeviceState *dev = DEVICE(opaque);
+I2CBus *i2c = I2C_BUS(qdev_get_parent_bus(dev));
+I2CSlave *slave = I2C_SLAVE(dev);
+MCTPI2CEndpoint *mctp = MCTP_I2C_ENDPOINT(dev);
+MCTPI2CEndpointClass *mc = MCTP_I2C_ENDPOINT_GET_CLASS(mctp);
+MCTPI2CPacket *pkt = (MCTPI2CPacket *)mctp->buffer;
+uint8_t flags = 0;
+
+switch (mctp->tx.state) {
+case I2C_MCTP_STATE_TX_SEND_BYTE:
+if (mctp->pos < mctp->len) {
+uint8_t byte = mctp->buffer[mctp->pos];
+
+t

Re: [PATCH v9 00/12] Add VIRTIO sound card

2023-09-14 Thread Manos Pitsidianakis


On Thu, 14 Sep 2023 12:54, Stefano Garzarella  wrote:

We are seeing something strange with the virtio-sound Linux driver.
It seems that the driver modifies the buffers after exposing them to
the device via the avail ring.


I need more information about this bug. What is the unexpected behavior 
that made you find that the buffer was modified in the first place?


Manos

Re: [sdl-qemu] [PATCH 0/1] There are no checks, virDomainChrSourceDefNew can return 0

2023-09-14 Thread Peter Krempa

CC-ing qemu-devel with a patch solely for libvirt doesn't make sense.

Also 'libvirt-security' list is private and is is intended as a first
contact list for stuff to be embargoed. It makes little sense to include
it when posting to the public 'libvir-list'.

On Thu, Sep 14, 2023 at 09:44:13 +, Миронов Сергей Владимирович wrote:
> There are no checks, virDomainChrSourceDefNew can return 0.

s/0/NULL

While very technically true, realistically that can't happen any more.

'virObjectNew' always returns a valid pointer or abort()s, and
VIR_CLASS_NEW can return 0 on programming errors.

Thus this is not a security issue.

> Return value of a function 'virDomainChrSourceDefNew'
> 
> is dereferenced at qemu_hotplug.c without checking for NULL,
> 
> but it is usually checked for this function.

Remove the extra empty lines please.

> 
> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> 
> Fixes: 1f85f0967b ("ci: jobs.sh: Add back '--no-suite syntax-check 
> --print-errorlogs'")

^^ This makes no sense. The commit you are referencing is changing a
shell script.

> 
> Signed-off-by: Sergey Mironov 
> 
> ---
> src/qemu/qemu_hotplug.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
> index 177ca87d11..09e16c2c7e 100644
> --- a/src/qemu/qemu_hotplug.c
> +++ b/src/qemu/qemu_hotplug.c
> @@ -3207,6 +3207,8 @@ qemuDomainAttachFSDevice(virQEMUDriver *driver,
>  qemuAssignDeviceFSAlias(vm->def, fs);
> 
>  chardev = virDomainChrSourceDefNew(priv->driver->xmlopt);
> +if (chardev == NULL)
> +   goto cleanup;
>  chardev->type = VIR_DOMAIN_CHR_TYPE_UNIX;
>  chardev->data.nix.path = qemuDomainGetVHostUserFSSocketPath(priv, fs);
> --
> 2.31.1

Re: [PATCH v2 07/24] accel/tcg: Validate placement of CPUNegativeOffsetState

2023-09-14 Thread Anton Johansson via




On 9/14/23 04:44, Richard Henderson wrote:

Verify that the distance between CPUNegativeOffsetState and
CPUArchState is no greater than any alignment requirements.

Signed-off-by: Richard Henderson 
---
  include/exec/cpu-all.h | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index c2c62160c6..86a7452b0d 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -459,6 +459,12 @@ static inline CPUState *env_cpu(CPUArchState *env)
  return &env_archcpu(env)->parent_obj;
  }
  
+/*

+ * Validate placement of CPUNegativeOffsetState.
+ */
+QEMU_BUILD_BUG_ON(offsetof(ArchCPU, env) - offsetof(ArchCPU, neg) >=
+  sizeof(CPUNegativeOffsetState) + __alignof(CPUArchState));
+
  /**
   * env_neg(env)
   * @env: The architecture environment

Reviewed-by: Anton Johansson

Re: [sdl-qemu] [PATCH 1/1] No checks, dereferencing possible

2023-09-14 Thread Peter Krempa

On Thu, Sep 14, 2023 at 09:44:16 +, Миронов Сергей Владимирович wrote:
> No checks, dereferencing possible.
> 
> 
> Return value of a function 'virDomainChrSourceDefNew'
> is dereferenced at qemu_command.c without checking
> for NULL, but it is usually checked for this function.

This description here doesn't make sense. You are checking the presence
of 'privateData' in 'virDomainVideoDef'.

> 
> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> 
> Fixes: 1f85f0967b ("ci: jobs.sh: Add back '--no-suite syntax-check 
> --print-errorlogs'")
> 
> Signed-off-by: Sergey Mironov 
> 
> ---
> src/qemu/qemu_command.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> index e84374b4cf..8d11972c88 100644
> --- a/src/qemu/qemu_command.c
> +++ b/src/qemu/qemu_command.c
> @@ -4698,6 +4698,8 @@ qemuBuildVideoCommandLine(virCommand *cmd,
>  g_autofree char *name = g_strdup_printf("%s-vhost-user", 
> video->info.alias);
>  qemuDomainChrSourcePrivate *chrsrcpriv = 
> QEMU_DOMAIN_CHR_SOURCE_PRIVATE(chrsrc);
> 
> +   if (chrsrc == NULL)
> +   return -1;

This addition doesn't make sense as it's dead code. The private data is
always allocated and checked that it's non-NULL in the qemu driver via
the callback in virDomainVideoDefNew.

Do you have a call trace that would prove me otherwise?

Re: [PATCH v9 00/12] Add VIRTIO sound card

2023-09-14 Thread Stefano Garzarella


On Thu, Sep 14, 2023 at 01:02:05PM +0300, Manos Pitsidianakis wrote:

On Thu, 14 Sep 2023 12:54, Stefano Garzarella  wrote:

We are seeing something strange with the virtio-sound Linux driver.
It seems that the driver modifies the buffers after exposing them to
the device via the avail ring.


I need more information about this bug. What is the unexpected 
behavior that made you find that the buffer was modified in the first 
place?


CCing Matias for more details, but initially can you just run the test
Matias suggested to check if you experience the same behaviour or not?

Stefano

Re: [PATCH v2 08/24] accel/tcg: Move CPUNegativeOffsetState into CPUState

2023-09-14 Thread Anton Johansson via




On 9/14/23 04:44, Richard Henderson wrote:

Retain the separate structure to emphasize its importance.
Enforce CPUArchState always follows CPUState without padding.

Signed-off-by: Richard Henderson 
---
  include/exec/cpu-all.h| 22 +-
  include/hw/core/cpu.h | 14 --
  target/alpha/cpu.h|  1 -
  target/arm/cpu.h  |  1 -
  target/avr/cpu.h  |  1 -
  target/cris/cpu.h |  1 -
  target/hexagon/cpu.h  |  2 +-
  target/hppa/cpu.h |  1 -
  target/i386/cpu.h |  1 -
  target/loongarch/cpu.h|  1 -
  target/m68k/cpu.h |  1 -
  target/microblaze/cpu.h   |  6 +++---
  target/mips/cpu.h |  4 ++--
  target/nios2/cpu.h|  1 -
  target/openrisc/cpu.h |  1 -
  target/ppc/cpu.h  |  1 -
  target/riscv/cpu.h|  2 +-
  target/rx/cpu.h   |  1 -
  target/s390x/cpu.h|  1 -
  target/sh4/cpu.h  |  1 -
  target/sparc/cpu.h|  1 -
  target/tricore/cpu.h  |  1 -
  target/xtensa/cpu.h   |  3 +--
  accel/tcg/translate-all.c |  4 ++--
  accel/tcg/translator.c|  8 
  25 files changed, 35 insertions(+), 46 deletions(-)


Reviewed-by: Anton Johansson

Re: [PATCH 11/11] qdev: Rework array properties based on list visitor

2023-09-14 Thread Peter Maydell

On Fri, 8 Sept 2023 at 15:37, Kevin Wolf  wrote:
>
> Until now, array properties are actually implemented with a hack that
> uses multiple properties on the QOM level: a static "foo-len" property
> and after it is set, dynamically created "foo[i]" properties.
>
> In external interfaces (-device on the command line and device_add in
> QMP), this interface was broken by commit f3558b1b ('qdev: Base object
> creation on QDict rather than QemuOpts') because QDicts are unordered
> and therefore it could happen that QEMU tried to set the indexed
> properties before setting the length, which fails and effectively makes
> array properties inaccessible. In particular, this affects the 'ports'
> property of the 'rocker' device.
>
> This patch reworks the external interface so that instead of using a
> separate top-level property for the length and for each element, we use
> a single true array property that accepts a list value. In the external
> interfaces, this is naturally expressed as a JSON list and makes array
> properties accessible again.
>
> Creating an array property on the command line without using JSON format
> is currently not possible. This could be fixed by switching from
> QemuOpts to a keyval parser, which however requires consideration of the
> compatibility implications.
>
> All internal users of devices with array properties go through
> qdev_prop_set_array() at this point, so updating it takes care of all of
> them.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1090
> Fixes: f3558b1b763683bb877f7dd5b282469cdadc65c3
> Signed-off-by: Kevin Wolf 

I'm hoping that somebody who understands the visitor APIs
better than me will have a look at this patch, but in the
meantime, here's my review, which I suspect has a lot of
comments that mostly reflect that I don't really understand
the visitor stuff...

> ---
>  include/hw/qdev-properties.h |  23 ++--
>  hw/core/qdev-properties-system.c |   2 +-
>  hw/core/qdev-properties.c| 204 +++
>  3 files changed, 133 insertions(+), 96 deletions(-)
>
> diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
> index 7fa2fdb7c9..9370b36b72 100644
> --- a/include/hw/qdev-properties.h
> +++ b/include/hw/qdev-properties.h
> @@ -61,7 +61,7 @@ extern const PropertyInfo qdev_prop_size;
>  extern const PropertyInfo qdev_prop_string;
>  extern const PropertyInfo qdev_prop_on_off_auto;
>  extern const PropertyInfo qdev_prop_size32;
> -extern const PropertyInfo qdev_prop_arraylen;
> +extern const PropertyInfo qdev_prop_array;
>  extern const PropertyInfo qdev_prop_link;
>
>  #define DEFINE_PROP(_name, _state, _field, _prop, _type, ...) {  \
> @@ -115,8 +115,6 @@ extern const PropertyInfo qdev_prop_link;
>  .bitmask= (_bitmask), \
>  .set_default = false)
>
> -#define PROP_ARRAY_LEN_PREFIX "len-"
> -
>  /**
>   * DEFINE_PROP_ARRAY:
>   * @_name: name of the array
> @@ -127,24 +125,21 @@ extern const PropertyInfo qdev_prop_link;
>   * @_arrayprop: PropertyInfo defining what property the array elements have
>   * @_arraytype: C type of the array elements
>   *
> - * Define device properties for a variable-length array _name.  A
> - * static property "len-arrayname" is defined. When the device creator
> - * sets this property to the desired length of array, further dynamic
> - * properties "arrayname[0]", "arrayname[1]", ...  are defined so the
> - * device creator can set the array element values. Setting the
> - * "len-arrayname" property more than once is an error.
> + * Define device properties for a variable-length array _name.  The array is
> + * represented as a list in the visitor interface.
> + *
> + * @_arraytype is required to be movable with memcpy().
>   *
> - * When the array length is set, the @_field member of the device
> + * When the array property is set, the @_field member of the device
>   * struct is set to the array length, and @_arrayfield is set to point
> - * to (zero-initialised) memory allocated for the array.  For a zero
> - * length array, @_field will be set to 0 and @_arrayfield to NULL.
> + * to the memory allocated for the array.
> + *
>   * It is the responsibility of the device deinit code to free the
>   * @_arrayfield memory.
>   */
>  #define DEFINE_PROP_ARRAY(_name, _state, _field,   \
>_arrayfield, _arrayprop, _arraytype) \
> -DEFINE_PROP((PROP_ARRAY_LEN_PREFIX _name), \
> -_state, _field, qdev_prop_arraylen, uint32_t,  \
> +DEFINE_PROP(_name, _state, _field, qdev_prop_array, uint32_t, \
>  .set_default = true,   \
>  .defval.u = 0, \
>  .arrayinfo = &(_arrayprop),\
> diff --git a/hw/core/qdev-properties-system.c 
> b/hw/core/qdev-properties-system.c
> index 6d5d43eda2..f557ee886e 100644
> --- a/hw/core/qdev-pro

Re: [PATCH v2 09/24] accel/tcg: Remove CPUState.icount_decr_ptr

2023-09-14 Thread Anton Johansson via




On 9/14/23 04:44, Richard Henderson wrote:

We can now access icount_decr directly.

Signed-off-by: Richard Henderson 
---
  include/exec/cpu-all.h | 1 -
  include/hw/core/cpu.h  | 2 --
  hw/core/cpu-common.c   | 4 ++--
  3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index c3c78ed8ab..3b01e4ee25 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -434,7 +434,6 @@ void tcg_exec_unrealizefn(CPUState *cpu);
  static inline void cpu_set_cpustate_pointers(ArchCPU *cpu)
  {
  cpu->parent_obj.env_ptr = &cpu->env;
-cpu->parent_obj.icount_decr_ptr = &cpu->parent_obj.neg.icount_decr;
  }
  
  /* Validate correct placement of CPUArchState. */

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 1f289136ec..44955af3bc 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -440,7 +440,6 @@ struct qemu_work_item;
   * @as: Pointer to the first AddressSpace, for the convenience of targets 
which
   *  only have a single AddressSpace
   * @env_ptr: Pointer to subclass-specific CPUArchState field.
- * @icount_decr_ptr: Pointer to IcountDecr field within subclass.
   * @gdb_regs: Additional GDB registers.
   * @gdb_num_regs: Number of total registers accessible to GDB.
   * @gdb_num_g_regs: Number of registers in GDB 'g' packets.
@@ -512,7 +511,6 @@ struct CPUState {
  MemoryRegion *memory;
  
  CPUArchState *env_ptr;

-IcountDecr *icount_decr_ptr;
  
  CPUJumpCache *tb_jmp_cache;
  
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c

index ced66c2b34..08d5bbc873 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -86,7 +86,7 @@ void cpu_exit(CPUState *cpu)
  qatomic_set(&cpu->exit_request, 1);
  /* Ensure cpu_exec will see the exit request after TCG has exited.  */
  smp_wmb();
-qatomic_set(&cpu->icount_decr_ptr->u16.high, -1);
+qatomic_set(&cpu->neg.icount_decr.u16.high, -1);
  }
  
  static int cpu_common_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg)

@@ -130,7 +130,7 @@ static void cpu_common_reset_hold(Object *obj)
  cpu->halted = cpu->start_powered_off;
  cpu->mem_io_pc = 0;
  cpu->icount_extra = 0;
-qatomic_set(&cpu->icount_decr_ptr->u32, 0);
+qatomic_set(&cpu->neg.icount_decr.u32, 0);
  cpu->can_do_io = 1;
  cpu->exception_index = -1;
  cpu->crash_occurred = false;

Reviewed-by: Anton Johansson

Re: [PATCH v2 13/24] accel/tcg: Replace CPUState.env_ptr with cpu_env()

2023-09-14 Thread Anton Johansson via




On 9/14/23 04:44, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  include/exec/cpu-all.h   |  1 -
  include/hw/core/cpu.h|  9 ++---
  target/arm/common-semi-target.h  |  2 +-
  accel/tcg/cpu-exec.c |  8 
  accel/tcg/cputlb.c   | 18 +-
  accel/tcg/translate-all.c|  4 ++--
  gdbstub/gdbstub.c|  4 ++--
  gdbstub/user-target.c|  2 +-
  hw/i386/kvm/clock.c  |  2 +-
  hw/intc/mips_gic.c   |  2 +-
  hw/intc/riscv_aclint.c   | 12 ++--
  hw/intc/riscv_imsic.c|  2 +-
  hw/ppc/e500.c|  4 ++--
  hw/ppc/spapr.c   |  2 +-
  linux-user/elfload.c |  4 ++--
  linux-user/i386/cpu_loop.c   |  2 +-
  linux-user/main.c|  4 ++--
  linux-user/signal.c  | 15 +++
  monitor/hmp-cmds-target.c|  2 +-
  semihosting/arm-compat-semi.c|  6 +++---
  semihosting/syscalls.c   | 28 ++--
  target/alpha/translate.c |  4 ++--
  target/arm/cpu.c |  8 
  target/arm/helper.c  |  2 +-
  target/arm/tcg/translate-a64.c   |  4 ++--
  target/arm/tcg/translate.c   |  6 +++---
  target/avr/translate.c   |  2 +-
  target/cris/translate.c  |  4 ++--
  target/hexagon/translate.c   |  4 ++--
  target/hppa/mem_helper.c |  2 +-
  target/hppa/translate.c  |  4 ++--
  target/i386/tcg/sysemu/excp_helper.c |  2 +-
  target/i386/tcg/tcg-cpu.c|  2 +-
  target/i386/tcg/translate.c  |  4 ++--
  target/loongarch/translate.c |  4 ++--
  target/m68k/translate.c  |  4 ++--
  target/microblaze/translate.c|  2 +-
  target/mips/tcg/sysemu/mips-semi.c   |  4 ++--
  target/mips/tcg/translate.c  |  4 ++--
  target/nios2/translate.c |  4 ++--
  target/openrisc/translate.c  |  2 +-
  target/ppc/excp_helper.c | 10 +-
  target/ppc/translate.c   |  4 ++--
  target/riscv/translate.c |  6 +++---
  target/rx/cpu.c  |  3 ---
  target/rx/translate.c|  2 +-
  target/s390x/tcg/translate.c |  2 +-
  target/sh4/op_helper.c   |  2 +-
  target/sh4/translate.c   |  4 ++--
  target/sparc/translate.c |  4 ++--
  target/tricore/translate.c   |  4 ++--
  target/xtensa/translate.c|  4 ++--
  target/i386/tcg/decode-new.c.inc |  2 +-
  53 files changed, 125 insertions(+), 127 deletions(-)

Reviewed-by: Anton Johansson

Re: [PATCH v2 15/24] accel/tcg: Remove env_neg()

2023-09-14 Thread Anton Johansson via




On 9/14/23 04:44, Richard Henderson wrote:

Replace the single use within env_tlb() and remove.

Signed-off-by: Richard Henderson 
---
  include/exec/cpu-all.h | 13 +
  1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 9db8544125..af9516654a 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -451,17 +451,6 @@ static inline CPUState *env_cpu(CPUArchState *env)
  return (void *)env - sizeof(CPUState);
  }
  
-/**

- * env_neg(env)
- * @env: The architecture environment
- *
- * Return the CPUNegativeOffsetState associated with the environment.
- */
-static inline CPUNegativeOffsetState *env_neg(CPUArchState *env)
-{
-return (void *)env - sizeof(CPUNegativeOffsetState);
-}
-
  /**
   * env_tlb(env)
   * @env: The architecture environment
@@ -470,7 +459,7 @@ static inline CPUNegativeOffsetState *env_neg(CPUArchState 
*env)
   */
  static inline CPUTLB *env_tlb(CPUArchState *env)
  {
-return &env_neg(env)->tlb;
+return &env_cpu(env)->neg.tlb;
  }
  
  #endif /* CPU_ALL_H */

Reviewed-by: Anton Johansson

Re: [PATCH v2 16/24] tcg: Remove TCGContext.tlb_fast_offset

2023-09-14 Thread Anton Johansson via




On 9/14/23 04:44, Richard Henderson wrote:

Now that there is no padding between CPUNegativeOffsetState
and CPUArchState, this value is constant across all targets.

Signed-off-by: Richard Henderson 
---
  include/tcg/tcg.h |  1 -
  accel/tcg/translate-all.c |  2 --
  tcg/tcg.c | 13 +++--
  3 files changed, 7 insertions(+), 9 deletions(-)

Reviewed-by: Anton Johansson

Re: [PATCH v2 02/24] accel/tcg: Move CPUTLB definitions from cpu-defs.h

2023-09-14 Thread Anton Johansson via



On 9/14/23 04:44, Richard Henderson wrote:

Accept that we will consume space in CPUState for CONFIG_USER_ONLY,
since we cannot test CONFIG_SOFTMMU within hw/core/cpu.h.

Signed-off-by: Richard Henderson
---
  include/exec/cpu-defs.h | 150 
  include/hw/core/cpu.h   | 141 +
  2 files changed, 141 insertions(+), 150 deletions(-)

Reviewed-by: Anton Johansson

Re: [PATCH] vdpa net: zero vhost_vdpa iova_tree pointer at cleanup

2023-09-14 Thread Lei Yang

QE tested this patch with real nic,guest can works well after
cancelling migration.

Tested-by: Lei Yang 

On Thu, Sep 14, 2023 at 11:23 AM Jason Wang  wrote:
>
> On Wed, Sep 13, 2023 at 8:34 PM Eugenio Pérez  wrote:
> >
> > Not zeroing it causes a SIGSEGV if the live migration is cancelled, at
> > net device restart.
> >
> > This is caused because CVQ tries to reuse the iova_tree that is present
> > in the first vhost_vdpa device at the end of vhost_vdpa_net_cvq_start.
> > As a consequence, it tries to access an iova_tree that has been already
> > free.
> >
> > Fixes: 00ef422e9fbf ("vdpa net: move iova tree creation from init to start")
> > Reported-by: Yanhui Ma 
> > Signed-off-by: Eugenio Pérez 
>
> Acked-by: Jason Wang 
>
> Thanks
>
> > ---
> >  net/vhost-vdpa.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 34202ca009..1714ff4b11 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -385,6 +385,8 @@ static void vhost_vdpa_net_client_stop(NetClientState 
> > *nc)
> >  dev = s->vhost_vdpa.dev;
> >  if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >  g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > +} else {
> > +s->vhost_vdpa.iova_tree = NULL;
> >  }
> >  }
> >
> > --
> > 2.39.3
> >
>

Re: [PATCH v3 3/3] linux-user/syscall.c: do_ppoll: eliminate large alloca

2023-09-14 Thread Michael Tokarev


14.09.2023 11:26, Michael Tokarev wrote:

14.09.2023 11:18, Daniel P. Berrangé wrote:

..

-    struct pollfd *pfd = NULL;
+    struct pollfd *pfd = NULL, *heap_pfd = NULL;


g_autofree struct pollfd *heap_pdf = NULL;


...

  out:
+    g_free(heap_pfd);


This can be dropped with g_autofree usage


Yes, I know this, - this was deliberate choice.
Personally I'm just too used to old-school explicit resource deallocations.
Here, there's a single place where everything gets freed, so there's little
reason to use fancy modern automatic deallocations. To my taste anyway.
Maybe some future modifications adding some future ppoll3.. :)

Sure thing I can drop that and change it to autofree.


Should I? If that's easier in todays world :)

/mjt

Re: [PATCH v3 3/3] linux-user/syscall.c: do_ppoll: eliminate large alloca

2023-09-14 Thread Daniel P . Berrangé

On Thu, Sep 14, 2023 at 02:05:21PM +0300, Michael Tokarev wrote:
> 14.09.2023 11:26, Michael Tokarev wrote:
> > 14.09.2023 11:18, Daniel P. Berrangé wrote:
> ..
> > > > -    struct pollfd *pfd = NULL;
> > > > +    struct pollfd *pfd = NULL, *heap_pfd = NULL;
> > > 
> > > g_autofree struct pollfd *heap_pdf = NULL;
> > > 
> > ...
> > > >   out:
> > > > +    g_free(heap_pfd);
> > > 
> > > This can be dropped with g_autofree usage
> > 
> > Yes, I know this, - this was deliberate choice.
> > Personally I'm just too used to old-school explicit resource deallocations.
> > Here, there's a single place where everything gets freed, so there's little
> > reason to use fancy modern automatic deallocations. To my taste anyway.
> > Maybe some future modifications adding some future ppoll3.. :)
> > 
> > Sure thing I can drop that and change it to autofree.
> 
> Should I? If that's easier in todays world :)

I prefer auto-free, but I'm fine with this commit either way, so

  Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH 2/2] iotests: add new test case for image streaming

2023-09-14 Thread Andrey Zhadchenko via

Check if we can list named block nodes when the block-stream is
finalized but not yet dismissed
This previously led to a crash

Signed-off-by: Andrey Zhadchenko 
---
 tests/qemu-iotests/030 | 17 +
 tests/qemu-iotests/030.out |  4 ++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 98595d47fe..2be407a8da 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -192,6 +192,23 @@ test_img = os.path.join(iotests.test_dir, 'test.img')
 self.assert_qmp(result, 'return', {})
 
 
+def test_auto_dismiss(self):
+result = self.vm.qmp('block-stream', device='drive0', 
auto_dismiss=False)
+completed = False
+while not completed:
+for event in self.vm.get_qmp_events(wait=True):
+self.assertNotEqual(event['event'], 'BLOCK_JOB_ERROR')
+if event['event'] == 'BLOCK_JOB_COMPLETED':
+self.assert_qmp(event, 'data/device', 'drive0')
+completed = True
+elif event['event'] == 'JOB_STATUS_CHANGE':
+self.assert_qmp(event, 'data/id', 'drive0')
+
+result = self.vm.qmp('query-named-block-nodes')
+result = self.vm.qmp('job-dismiss', id='drive0')
+self.assert_qmp(result, 'return', {})
+
+
 class TestParallelOps(iotests.QMPTestCase):
 num_ops = 4 # Number of parallel block-stream operations
 num_imgs = num_ops * 2 + 1
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 6d9bee1a4b..af8dac10f9 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-...
+
 --
-Ran 27 tests
+Ran 28 tests
 
 OK
-- 
2.40.1

[PATCH 0/2] block: do not try to list nearly-dropped filters

2023-09-14 Thread Andrey Zhadchenko via

QEMU crashes on QMP command 'query-named-block-nodes' if we have
finalized but not dismissed block job with filter, for example
block-stream.
This happens because the filter no longer has references from which
QEMU can query block info. Skip such filters while listing block nodes.

This patchset also adds simple test for this case.

Andrey Zhadchenko (2):
  block: do not try to list nearly-dropped filters
  iotests: add new test case for image streaming

 block.c|  7 +--
 block/qapi.c   |  5 +
 tests/qemu-iotests/030 | 17 +
 tests/qemu-iotests/030.out |  4 ++--
 4 files changed, 29 insertions(+), 4 deletions(-)

-- 
2.40.1

[PATCH 1/2] block: do not try to list nearly-dropped filters

2023-09-14 Thread Andrey Zhadchenko via

When the block job ends, it removes filter from the tree. However the
last reference to filter bds is dropped when the job is destroyed.
So when we have finalized but not dismissed job, if we try to
'query-named-block-nodes', QEMU will stumble upon a half-dead filter
and crash, since the filter now has no childs, no bd->file, etc.

Example of crash:

Thread 1 "qemu-storage-da" received signal SIGSEGV, Segmentation fault.
0x0048795e in bdrv_refresh_filename (bs=0x7a5280) at ../block.c:7860
7860assert(QLIST_NEXT(child, next) == NULL);
(gdb) bt
0  0x0048795e in bdrv_refresh_filename (bs=0x7a5280) at ../block.c:7860
1  0x004e2804 in bdrv_block_device_info (blk=0x0, bs=0x7a5280, 
flat=false, errp=0x7fffdaa0) at ../block/qapi.c:58
2  0x0049634e in bdrv_named_nodes_list (flat=false, 
errp=0x7fffdaa0) at ../block.c:6098
3  0x0047fe1b in qmp_query_named_block_nodes (has_flat=false, 
flat=false, errp=0x7fffdaa0) at ../blockdev.c:3097
4  0x005a381d in qmp_marshal_query_named_block_nodes 
(args=0x7fffe80075d0, ret=0x76c38e38, errp=0x76c38e40)
at qapi/qapi-commands-block-core.c:554
5  0x005de657 in do_qmp_dispatch_bh (opaque=0x76c38e08) at 
../qapi/qmp-dispatch.c:128
6  0x00603f42 in aio_bh_call (bh=0x773c00) at ../util/async.c:142
7  0x0060407d in aio_bh_poll (ctx=0x75de00) at ../util/async.c:170
8  0x005eb809 in aio_dispatch (ctx=0x75de00) at ../util/aio-posix.c:421
9  0x00604eea in aio_ctx_dispatch (source=0x75de00, callback=0x0, 
user_data=0x0) at ../util/async.c:312
10 0x77c93d6f in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
11 0x00617508 in glib_pollfds_poll () at ../util/main-loop.c:297
12 0x00617360 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:320
13 0x00617263 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:596
14 0x0046bbd6 in main (argc=13, argv=0x7fffde28) at 
../storage-daemon/qemu-storage-daemon.c:409

Allow bdrv_block_device_info() to return NULL for filters in a such
state. Modify bdrv_named_nodes_list() to skip bds for which it cannot
get an info.

Signed-off-by: Andrey Zhadchenko 
---
 block.c  | 7 +--
 block/qapi.c | 5 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index a307c151a8..e559b0aafa 100644
--- a/block.c
+++ b/block.c
@@ -6145,16 +6145,19 @@ BlockDeviceInfoList *bdrv_named_nodes_list(bool flat,
 BlockDeviceInfoList *list;
 BlockDriverState *bs;
 
+ERRP_GUARD();
 GLOBAL_STATE_CODE();
 
 list = NULL;
 QTAILQ_FOREACH(bs, &graph_bdrv_states, node_list) {
 BlockDeviceInfo *info = bdrv_block_device_info(NULL, bs, flat, errp);
-if (!info) {
+if (*errp) {
 qapi_free_BlockDeviceInfoList(list);
 return NULL;
 }
-QAPI_LIST_PREPEND(list, info);
+if (info) {
+QAPI_LIST_PREPEND(list, info);
+}
 }
 
 return list;
diff --git a/block/qapi.c b/block/qapi.c
index f34f95e0ef..b7b48f7540 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -57,6 +57,11 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 return NULL;
 }
 
+/* Do not try to get info from empty filters */
+if (bs->drv->is_filter && !QLIST_FIRST(&bs->children)) {
+return NULL;
+}
+
 bdrv_refresh_filename(bs);
 
 info = g_malloc0(sizeof(*info));
-- 
2.40.1

Re: [PATCH] mem/x86: add processor address space check for VM memory

2023-09-14 Thread Ani Sinha




> On 14-Sep-2023, at 2:07 PM, David Hildenbrand  wrote:
> 
> On 14.09.23 07:53, Ani Sinha wrote:
>>> On 12-Sep-2023, at 9:04 PM, David Hildenbrand  wrote:
>>> 
>>> [...]
>>> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 54838c0c41..d187890675 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -908,9 +908,12 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, 
> uint64_t pci_hole64_size)
> {
> X86CPU *cpu = X86_CPU(first_cpu);
> 
> -/* 32-bit systems don't have hole64 thus return max CPU address */
> -if (cpu->phys_bits <= 32) {
> -return ((hwaddr)1 << cpu->phys_bits) - 1;
> +/*
> + * 32-bit systems don't have hole64, but we might have a region for
> + * memory hotplug.
> + */
> +if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) {
> +return pc_pci_hole64_start() - 1;
 Ok this is very confusing! I am looking at pc_pci_hole64_start() function. 
 I have a few questions …
 (a) pc_get_device_memory_range() returns the size of the device memory as 
 the difference between ram_size and maxram_size. But from what I 
 understand, ram_size is the actual size of the ram present and maxram_size 
 is the max size of ram *after* hot plugging additional memory. How can we 
 assume that the additional available space is already occupied by hot 
 plugged memory?
>>> 
>>> Let's take a look at an example:
>>> 
>>> $ ./build/qemu-system-x86_64 -m 8g,maxmem=16g,slots=1 \
>>>  -object memory-backend-ram,id=mem0,size=1g \
>>>  -device pc-dimm,memdev=mem0 \
>>>  -nodefaults -nographic -S -monitor stdio
>>> 
>>> (qemu) info mtree
>>> ...
>>> memory-region: system
>>>  - (prio 0, i/o): system
>>>-bfff (prio 0, ram): alias ram-below-4g 
>>> @pc.ram -bfff
>>>- (prio -1, i/o): pci
>>>  000c-000d (prio 1, rom): pc.rom
>>>  000e-000f (prio 1, rom): alias isa-bios 
>>> @pc.bios 0002-0003
>>>  fffc- (prio 0, rom): pc.bios
>>>000a-000b (prio 1, i/o): alias smram-region @pci 
>>> 000a-000b
>>>000c-000c3fff (prio 1, i/o): alias pam-pci @pci 
>>> 000c-000c3fff
>>>000c4000-000c7fff (prio 1, i/o): alias pam-pci @pci 
>>> 000c4000-000c7fff
>>>000c8000-000cbfff (prio 1, i/o): alias pam-pci @pci 
>>> 000c8000-000cbfff
>>>000cc000-000c (prio 1, i/o): alias pam-pci @pci 
>>> 000cc000-000c
>>>000d-000d3fff (prio 1, i/o): alias pam-pci @pci 
>>> 000d-000d3fff
>>>000d4000-000d7fff (prio 1, i/o): alias pam-pci @pci 
>>> 000d4000-000d7fff
>>>000d8000-000dbfff (prio 1, i/o): alias pam-pci @pci 
>>> 000d8000-000dbfff
>>>000dc000-000d (prio 1, i/o): alias pam-pci @pci 
>>> 000dc000-000d
>>>000e-000e3fff (prio 1, i/o): alias pam-pci @pci 
>>> 000e-000e3fff
>>>000e4000-000e7fff (prio 1, i/o): alias pam-pci @pci 
>>> 000e4000-000e7fff
>>>000e8000-000ebfff (prio 1, i/o): alias pam-pci @pci 
>>> 000e8000-000ebfff
>>>000ec000-000e (prio 1, i/o): alias pam-pci @pci 
>>> 000ec000-000e
>>>000f-000f (prio 1, i/o): alias pam-pci @pci 
>>> 000f-000f
>>>fec0-fec00fff (prio 0, i/o): ioapic
>>>fed0-fed003ff (prio 0, i/o): hpet
>>>fee0-feef (prio 4096, i/o): apic-msi
>>>0001-00023fff (prio 0, ram): alias ram-above-4g 
>>> @pc.ram c000-0001
>>>00024000-00047fff (prio 0, i/o): device-memory
>>>  00024000-00027fff (prio 0, ram): mem0
>>> 
>>> 
>>> We requested 8G of boot memory, which is split between "<4G" memory and 
>>> ">=4G" memory.
>>> 
>>> We only place exactly 3G (0x0->0xbfff) under 4G, starting at address 0.
>> I can’t reconcile this with this code for q35:
>>if (machine->ram_size >= 0xb000) {
>> lowmem = 0x8000; // max memory 0x8fff or 2.25 GiB
>> } else {
>> lowmem = 0xb000; // max memory 0xbfff or 3 GiB
>> }
>> You assigned 8 Gib to ram which is > 0xb000 (2.75 Gib)
> 
> QEMU defaults to the "pc" machine. If you add "-M q35" you get:
> 
> address-space: memory
>  - (prio 0, i/o): system
>-7fff (prio 0, r

[risu PATCH v3 5/7] s390x: Update the configure script for s390x support

2023-09-14 Thread Thomas Huth

Auto-detect s390x hosts and add s390x information to the help text.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Thomas Huth 
---
 configure | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index ca2d7db..2f7c580 100755
--- a/configure
+++ b/configure
@@ -58,6 +58,8 @@ guess_arch() {
 ARCH="m68k"
 elif check_define __powerpc64__ ; then
 ARCH="ppc64"
+elif check_define __s390x__ ; then
+ARCH="s390x"
 else
 echo "This cpu is not supported by risu. Try -h. " >&2
 exit 1
@@ -139,7 +141,7 @@ Some influential environment variables:
prefixed with the given string.
 
   ARCH force target architecture instead of trying to detect it.
-   Valid values=[arm|aarch64|ppc64|ppc64le|m68k]
+   Valid values=[arm|aarch64|m68k|ppc64|ppc64le|s390x]
 
   CC   C compiler command
   CFLAGS   C compiler flags
-- 
2.41.0

[risu PATCH v3 3/7] s390x: Add simple s390x.risu file

2023-09-14 Thread Thomas Huth

This only adds a limited set of s390x instructions for initial testing.
More instructions will be added later.

Signed-off-by: Thomas Huth 
---
 s390x.risu | 81 ++
 1 file changed, 81 insertions(+)
 create mode 100644 s390x.risu

diff --git a/s390x.risu b/s390x.risu
new file mode 100644
index 000..1661be6
--- /dev/null
+++ b/s390x.risu
@@ -0,0 +1,81 @@
+###
+# Copyright 2023 Red Hat Inc.
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Thomas Huth - initial implementation
+###
+
+.mode s390x
+
+# format:RR Add (register + register, 32 bit)
+AR Z 00011010 r1:4 r2:4
+
+# format:RRE Add (register + register, 64 bit)
+AGR Z 10111001 1000  r1:4 r2:4
+
+# format:RRE Add (register + register, 32 bit to 64 bit)
+AGFR Z 10111001 00011000  r1:4 r2:4
+
+# format:RRF-a Add (three registers, 32 bit)
+ARK STFLE45 10111001 1000 r3:4  r1:4 r2:4
+
+# format:RRF-a Add (three registers, 64 bit)
+AGRK STFLE45 10111001 11101000 r3:4  r1:4 r2:4
+
+
+# format:RRE Add Halfword Immediate (32 bit)
+AHI Z 10100111 r1:4 1010 i2:16
+
+# format:RI Add Halfword Immediate (64 bit)
+AGHI Z 10100111 r1:4 1011 i2:16
+
+
+# format:RR Add Logical (32 bit)
+ALR Z 0000 r1:4 r2:4
+
+# format:RRE Add Logical (64 bit)
+ALGR Z 10111001 1010  r1:4 r2:4
+
+# format:RRE Add Logical (32 bit to 64 bit)
+ALGFR Z 10111001 00011010  r1:4 r2:4
+
+
+# format:RRF-c Population Count
+POPCNT STFLE45 10111001 1111 m3:4  r1:4 r2:4
+
+
+## Binary floating point instructions ##
+
+# format:RRE ADD (short BFP)
+AEBR BFP 10110011 1010  r1:4 r2:4
+
+# format:RRE ADD (long BFP)
+ADBR BFP 10110011 00011010  r1:4 r2:4
+
+# format:RRE ADD (extended BFP)
+AXBR BFP 10110011 01001010  r1:4 r2:4
+
+
+# format:RRE COMPARE (short BFP)
+CEBR BFP 10110011 1001  r1:4 r2:4
+
+# format:RRE COMPARE (long BFP)
+CDBR BFP 10110011 00011001  r1:4 r2:4
+
+# format:RRE COMPARE (extended BFP)
+CXBR BFP 10110011 01001001  r1:4 r2:4
+
+
+# format:RRF-e LOAD FP INTEGER (short BFP)
+FIEBRA BFP 10110011 01010111 m3:4 m4:4 r1:4 r2:4
+
+# format:RRF-e LOAD FP INTEGER (long BFP)
+FIDBRA BFP 10110011 0101 m3:4 m4:4 r1:4 r2:4
+
+# format:RRF-e LOAD FP INTEGER (extended BFP)
+FIXBRA BFP 10110011 01000111 m3:4 m4:4 r1:4 r2:4
+
-- 
2.41.0

[risu PATCH v3 2/7] s390x: Add basic s390x support to the C code

2023-09-14 Thread Thomas Huth

With these changes, it is now possible to compile the "risu" binary
for s390x hosts.

Signed-off-by: Thomas Huth 
---
 risu_reginfo_s390x.c | 140 +++
 risu_reginfo_s390x.h |  25 
 risu_s390x.c |  51 
 test_s390x.S |  53 
 4 files changed, 269 insertions(+)
 create mode 100644 risu_reginfo_s390x.c
 create mode 100644 risu_reginfo_s390x.h
 create mode 100644 risu_s390x.c
 create mode 100644 test_s390x.S

diff --git a/risu_reginfo_s390x.c b/risu_reginfo_s390x.c
new file mode 100644
index 000..3fd91b9
--- /dev/null
+++ b/risu_reginfo_s390x.c
@@ -0,0 +1,140 @@
+/**
+ * Copyright 2023 Red Hat Inc.
+ * All rights reserved. This program and the accompanying materials
+ * are made available under the terms of the Eclipse Public License v1.0
+ * which accompanies this distribution, and is available at
+ * http://www.eclipse.org/legal/epl-v10.html
+ *
+ * Contributors:
+ * Thomas Huth - initial implementation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "risu.h"
+#include "risu_reginfo_s390x.h"
+
+
+const struct option * const arch_long_opts;
+const char * const arch_extra_help;
+
+void process_arch_opt(int opt, const char *arg)
+{
+abort();
+}
+
+void arch_init(void)
+{
+}
+
+int reginfo_size(struct reginfo *ri)
+{
+return sizeof(*ri);
+}
+
+/* reginfo_init: initialize with a ucontext */
+void reginfo_init(struct reginfo *ri, ucontext_t *uc, void *siaddr)
+{
+struct ucontext_extended *uce = (struct ucontext_extended *)uc;
+
+memset(ri, 0, sizeof(*ri));
+
+/*
+ * We can get the size of the instruction by looking at the
+ * first two bits of the instruction
+ */
+switch (*(uint8_t *)siaddr >> 6) {
+case 0:
+ri->faulting_insn = *(uint16_t *)siaddr;
+ri->faulting_insn_len = 2;
+break;
+case 3:
+ri->faulting_insn = ((*(uint32_t *)siaddr) << 16)
+| *(uint16_t *)(siaddr + 4);
+ri->faulting_insn_len = 6;
+break;
+default:
+ri->faulting_insn = *(uint32_t *)siaddr;
+ri->faulting_insn_len = 4;
+}
+
+ri->psw_mask = uce->uc_mcontext.regs.psw.mask;
+ri->pc_offset = (uintptr_t)siaddr - image_start_address;
+
+memcpy(ri->gprs, uce->uc_mcontext.regs.gprs, sizeof(ri->gprs));
+
+ri->fpc = uc->uc_mcontext.fpregs.fpc;
+memcpy(ri->fprs, &uc->uc_mcontext.fpregs.fprs, sizeof(ri->fprs));
+}
+
+/* reginfo_is_eq: compare the reginfo structs, returns nonzero if equal */
+int reginfo_is_eq(struct reginfo *m, struct reginfo *a)
+{
+return m->pc_offset == a->pc_offset &&
+   m->fpc == a->fpc &&
+   memcmp(m->gprs, a->gprs, sizeof(m->gprs)) == 0 &&
+   memcmp(&m->fprs, &a->fprs, sizeof(m->fprs)) == 0;
+}
+
+/* reginfo_dump: print state to a stream, returns nonzero on success */
+int reginfo_dump(struct reginfo *ri, FILE * f)
+{
+int i;
+
+fprintf(f, "  faulting insn 0x%" PRIx64 "\n", ri->faulting_insn);
+fprintf(f, "  PSW mask  0x%" PRIx64 "\n", ri->psw_mask);
+fprintf(f, "  PC offset 0x%" PRIx64 "\n\n", ri->pc_offset);
+
+for (i = 0; i < 16/2; i++) {
+fprintf(f, "\tr%d: %16lx\tr%02d: %16lx\n", i, ri->gprs[i],
+i + 8, ri->gprs[i + 8]);
+}
+fprintf(f, "\n");
+
+for (i = 0; i < 16/2; i++) {
+fprintf(f, "\tf%d: %16lx\tf%02d: %16lx\n",
+i, *(uint64_t *)&ri->fprs[i],
+i + 8, *(uint64_t *)&ri->fprs[i + 8]);
+}
+fprintf(f, "\tFPC: %8x\n\n", ri->fpc);
+
+return !ferror(f);
+}
+
+int reginfo_dump_mismatch(struct reginfo *m, struct reginfo *a, FILE *f)
+{
+int i;
+
+if (m->pc_offset != a->pc_offset) {
+fprintf(f, "Mismatch: PC offset master: [%016lx] - PC offset 
apprentice: [%016lx]\n",
+m->pc_offset, a->pc_offset);
+}
+
+for (i = 0; i < 16; i++) {
+if (m->gprs[i] != a->gprs[i]) {
+fprintf(f, "Mismatch: r%d master: [%016lx] - r%d apprentice: 
[%016lx]\n",
+i, m->gprs[i], i, a->gprs[i]);
+}
+}
+
+for (i = 0; i < 16; i++) {
+if (*(uint64_t *)&m->fprs[i] != *(uint64_t *)&a->fprs[i]) {
+fprintf(f, "Mismatch: f%d master: [%016lx] - f%d apprentice: 
[%016lx]\n",
+i, *(uint64_t *)&m->fprs[i],
+i, *(uint64_t *)&a->fprs[i]);
+}
+}
+
+if (m->fpc != a->fpc) {
+fprintf(f, "Mismatch: FPC master: [%08x] - FPC apprentice: [%08x]\n",
+m->fpc, a->fpc);
+}
+
+return !ferror(f);
+}
diff --git a/risu_reginfo_s390x.h b/risu_reginfo_s390x.h
new file mode 100644
index 000..c65fff7
--- /dev/null
+++ b/risu_reginfo_s390x.h
@@ -0,0 +1,25 @@
+/

[risu RFC PATCH v3 7/7] Add a travis.yml file for testing RISU in the Travis-CI

2023-09-14 Thread Thomas Huth

Travis-CI offers native build machines for aarch64, ppc64le
and s390x, so this is very useful for testing RISU on these
architectures. While compiling works fine for all architectures,
running the binary currently only works for s390x (the aarch64
runner reports a mismatch when comparing the registers, and
the ppc64le runner simply hangs), so we can only run the
resulting binary on s390x right now.

Signed-off-by: Thomas Huth 
---
 Not sure if this is useful for anybody but me since Travis is
 not that popular anymore these days ... so please feel free
 to ignore this patch.

 .travis.yml | 37 +
 1 file changed, 37 insertions(+)
 create mode 100644 .travis.yml

diff --git a/.travis.yml b/.travis.yml
new file mode 100644
index 000..bafa8df
--- /dev/null
+++ b/.travis.yml
@@ -0,0 +1,37 @@
+dist: focal
+language: c
+compiler:
+  - gcc
+addons:
+  apt:
+packages:
+  - perl
+  - perl-modules
+  - liblist-compare-perl
+
+before_script:
+  - ./configure
+script:
+  - set -e
+  - make -j2
+  - ./risugen --numinsns 1000 ${ARCH}.risu ${ARCH}.bin
+
+matrix:
+  include:
+
+- env:
+- ARCH="aarch64"
+  arch: arm64
+
+- env:
+- ARCH="ppc64"
+  arch: ppc64le
+
+- env:
+- ARCH="s390x"
+  arch: s390x
+  after_script:
+  - ./risu --master ${ARCH}.bin > stdout.txt 2> stderr.txt &
+  - sleep 1
+  - ./risu --host localhost ${ARCH}.bin
+  - cat stdout.txt stderr.txt
-- 
2.41.0

[risu PATCH v3 6/7] build-all-archs: Add s390x to the script that builds all architectures

2023-09-14 Thread Thomas Huth

To avoid regressions, let's check s390x also via this file.

Suggested-by: Peter Maydell 
Signed-off-by: Thomas Huth 
---
 build-all-archs | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/build-all-archs b/build-all-archs
index e5dcfc8..e89851b 100755
--- a/build-all-archs
+++ b/build-all-archs
@@ -91,7 +91,8 @@ program_exists() {
 for triplet in i386-linux-gnu i686-linux-gnu x86_64-linux-gnu \
aarch64-linux-gnu arm-linux-gnueabihf \
m68k-linux-gnu \
-   powerpc64le-linux-gnu powerpc64-linux-gnu ; do
+   powerpc64le-linux-gnu powerpc64-linux-gnu \
+   s390x-linux-gnu ; do
 
 if ! program_exists "${triplet}-gcc"; then
 echo "Skipping ${triplet}: no compiler found"
-- 
2.41.0

[risu PATCH v3 0/7] Add support for s390x to RISU

2023-09-14 Thread Thomas Huth

 Hi Peter!

Here are some patches that add basic support for s390x to RISU.
It's still quite limited, e.g. no support for load/store memory
operations yet, but the basics with simple 16-bit or 32-bit
instructions work *now* already fine.

(I'm also already experimenting in extending RISU to support
instructions with opcodes lengths > 32-bit, but the patches
are not quite ready yet)

v3:
- Use si_addr to get the address of the faulting instruction
  (since the PSW address in the ucontext already points to the
  next instruction)
- Use the kernel header asm/ucontext.h instead the one from the
  glibc since the latter does not provide information about
  vector registers (VRs are not used yet, but will be added later)
- Disable exceptions in the floating point control register so
  we can also test floating point instructions
- Added some more instructions to s390x.risu
- Added RFC patch to compile test aarch64, ppc64le and s390x on
  Travis-CI

v2:
- Removed the code to avoid r14 (return address) and r15 (stack pointer)
  since it is not necessary anymore since commit ad82a069e8d6a
- Initialize the floating point registers in test_s390x.S, too
- Added Acked-bys and Reviewed-bys from v1

Thomas Huth (7):
  Pass siginfo_t->si_addr to the reginfo_init() function
  s390x: Add basic s390x support to the C code
  s390x: Add simple s390x.risu file
  s390x: Add basic risugen perl module for s390x
  s390x: Update the configure script for s390x support
  build-all-archs: Add s390x to the script that builds all architectures
  Add a travis.yml file for testing RISU in the Travis-CI

 .travis.yml|  37 
 build-all-archs|   3 +-
 configure  |   4 +-
 risu.c |  12 +--
 risu.h |   2 +-
 risu_reginfo_aarch64.c |   2 +-
 risu_reginfo_arm.c |   2 +-
 risu_reginfo_i386.c|   2 +-
 risu_reginfo_loongarch64.c |   2 +-
 risu_reginfo_m68k.c|   2 +-
 risu_reginfo_ppc64.c   |   2 +-
 risu_reginfo_s390x.c   | 140 
 risu_reginfo_s390x.h   |  25 +
 risu_s390x.c   |  51 ++
 risugen_s390x.pm   | 186 +
 s390x.risu |  81 
 test_s390x.S   |  53 +++
 17 files changed, 591 insertions(+), 15 deletions(-)
 create mode 100644 .travis.yml
 create mode 100644 risu_reginfo_s390x.c
 create mode 100644 risu_reginfo_s390x.h
 create mode 100644 risu_s390x.c
 create mode 100644 risugen_s390x.pm
 create mode 100644 s390x.risu
 create mode 100644 test_s390x.S

-- 
2.41.0

[risu PATCH v3 1/7] Pass siginfo_t->si_addr to the reginfo_init() function

2023-09-14 Thread Thomas Huth

On s390x, we need the si_addr from the siginfo_t to get to
the address of the illegal instruction (the PSW address in
the ucontext_t is already pointing to the next instruction
there). So let's prepare for that situation and pass the
si_addr to the reginfo_init() function everywhere.

Signed-off-by: Thomas Huth 
---
 risu.c | 12 ++--
 risu.h |  2 +-
 risu_reginfo_aarch64.c |  2 +-
 risu_reginfo_arm.c |  2 +-
 risu_reginfo_i386.c|  2 +-
 risu_reginfo_loongarch64.c |  2 +-
 risu_reginfo_m68k.c|  2 +-
 risu_reginfo_ppc64.c   |  2 +-
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/risu.c b/risu.c
index 714074e..36fc82a 100644
--- a/risu.c
+++ b/risu.c
@@ -106,14 +106,14 @@ static void respond(RisuResult r)
 }
 }
 
-static RisuResult send_register_info(void *uc)
+static RisuResult send_register_info(void *uc, void *siaddr)
 {
 uint64_t paramreg;
 RisuResult res;
 RisuOp op;
 void *extra;
 
-reginfo_init(&ri[MASTER], uc);
+reginfo_init(&ri[MASTER], uc, siaddr);
 op = get_risuop(&ri[MASTER]);
 
 /* Write a header with PC/op to keep in sync */
@@ -178,7 +178,7 @@ static void master_sigill(int sig, siginfo_t *si, void *uc)
 RisuResult r;
 signal_count++;
 
-r = send_register_info(uc);
+r = send_register_info(uc, si->si_addr);
 if (r == RES_OK) {
 advance_pc(uc);
 } else {
@@ -232,13 +232,13 @@ static RisuResult recv_register_info(struct reginfo *ri)
 }
 }
 
-static RisuResult recv_and_compare_register_info(void *uc)
+static RisuResult recv_and_compare_register_info(void *uc, void *siaddr)
 {
 uint64_t paramreg;
 RisuResult res;
 RisuOp op;
 
-reginfo_init(&ri[APPRENTICE], uc);
+reginfo_init(&ri[APPRENTICE], uc, siaddr);
 
 res = recv_register_info(&ri[MASTER]);
 if (res != RES_OK) {
@@ -315,7 +315,7 @@ static void apprentice_sigill(int sig, siginfo_t *si, void 
*uc)
 RisuResult r;
 signal_count++;
 
-r = recv_and_compare_register_info(uc);
+r = recv_and_compare_register_info(uc, si->si_addr);
 if (r == RES_OK) {
 advance_pc(uc);
 } else {
diff --git a/risu.h b/risu.h
index bdb70c1..2c43384 100644
--- a/risu.h
+++ b/risu.h
@@ -115,7 +115,7 @@ RisuOp get_risuop(struct reginfo *ri);
 uintptr_t get_pc(struct reginfo *ri);
 
 /* initialize structure from a ucontext */
-void reginfo_init(struct reginfo *ri, ucontext_t *uc);
+void reginfo_init(struct reginfo *ri, ucontext_t *uc, void *siaddr);
 
 /* return 1 if structs are equal, 0 otherwise. */
 int reginfo_is_eq(struct reginfo *r1, struct reginfo *r2);
diff --git a/risu_reginfo_aarch64.c b/risu_reginfo_aarch64.c
index be47980..1244454 100644
--- a/risu_reginfo_aarch64.c
+++ b/risu_reginfo_aarch64.c
@@ -82,7 +82,7 @@ int reginfo_size(struct reginfo *ri)
 }
 
 /* reginfo_init: initialize with a ucontext */
-void reginfo_init(struct reginfo *ri, ucontext_t *uc)
+void reginfo_init(struct reginfo *ri, ucontext_t *uc, void *siaddr)
 {
 int i;
 struct _aarch64_ctx *ctx, *extra = NULL;
diff --git a/risu_reginfo_arm.c b/risu_reginfo_arm.c
index 202120b..85a39ac 100644
--- a/risu_reginfo_arm.c
+++ b/risu_reginfo_arm.c
@@ -118,7 +118,7 @@ static void reginfo_init_vfp(struct reginfo *ri, ucontext_t 
*uc)
 }
 }
 
-void reginfo_init(struct reginfo *ri, ucontext_t *uc)
+void reginfo_init(struct reginfo *ri, ucontext_t *uc, void *siaddr)
 {
 memset(ri, 0, sizeof(*ri)); /* necessary for memcmp later */
 
diff --git a/risu_reginfo_i386.c b/risu_reginfo_i386.c
index e9730be..834b2ed 100644
--- a/risu_reginfo_i386.c
+++ b/risu_reginfo_i386.c
@@ -102,7 +102,7 @@ static void *xsave_feature_buf(struct _xstate *xs, int 
feature)
 }
 
 /* reginfo_init: initialize with a ucontext */
-void reginfo_init(struct reginfo *ri, ucontext_t *uc)
+void reginfo_init(struct reginfo *ri, ucontext_t *uc, void *siaddr)
 {
 int i, nvecregs;
 struct _fpstate *fp;
diff --git a/risu_reginfo_loongarch64.c b/risu_reginfo_loongarch64.c
index af6ab77..16384f1 100644
--- a/risu_reginfo_loongarch64.c
+++ b/risu_reginfo_loongarch64.c
@@ -81,7 +81,7 @@ static int parse_extcontext(struct sigcontext *sc, struct 
extctx_layout *extctx)
 }
 
 /* reginfo_init: initialize with a ucontext */
-void reginfo_init(struct reginfo *ri, ucontext_t *context)
+void reginfo_init(struct reginfo *ri, ucontext_t *context, void *siaddr)
 {
 int i;
 struct ucontext *uc = (struct ucontext *)context;
diff --git a/risu_reginfo_m68k.c b/risu_reginfo_m68k.c
index 4c25e77..e29da84 100644
--- a/risu_reginfo_m68k.c
+++ b/risu_reginfo_m68k.c
@@ -33,7 +33,7 @@ int reginfo_size(struct reginfo *ri)
 }
 
 /* reginfo_init: initialize with a ucontext */
-void reginfo_init(struct reginfo *ri, ucontext_t *uc)
+void reginfo_init(struct reginfo *ri, ucontext_t *uc, void *siaddr)
 {
 int i;
 memset(ri, 0, sizeof(*ri));
diff --git a/risu_reginfo_ppc64.c b/risu_reginfo_ppc64.c
index 9899b36..

[risu PATCH v3 4/7] s390x: Add basic risugen perl module for s390x

2023-09-14 Thread Thomas Huth

This implements support for simple 16-bit and 32-bit instructions.
Support for 48-bit instructions and support for load/store memory
instructions is not implemented yet.

Signed-off-by: Thomas Huth 
---
 risugen_s390x.pm | 186 +++
 1 file changed, 186 insertions(+)
 create mode 100644 risugen_s390x.pm

diff --git a/risugen_s390x.pm b/risugen_s390x.pm
new file mode 100644
index 000..260e2dd
--- /dev/null
+++ b/risugen_s390x.pm
@@ -0,0 +1,186 @@
+#!/usr/bin/perl -w
+###
+# Copyright 2023 Red Hat Inc.
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Thomas Huth - initial implementation (based on risugen_ppc64.pm etc.)
+###
+
+# risugen -- generate a test binary file for use with risu
+# See 'risugen --help' for usage information.
+package risugen_s390x;
+
+use strict;
+use warnings;
+
+use risugen_common;
+
+require Exporter;
+
+our @ISA= qw(Exporter);
+our @EXPORT = qw(write_test_code);
+
+my $periodic_reg_random = 1;
+
+# Maximum alignment restriction permitted for a memory op.
+my $MAXALIGN = 64;
+
+sub write_mov_ri($$$)
+{
+my ($r, $imm_h, $imm_l) = @_;
+
+# LGFI
+insn16(0xc0 << 8 | $r << 4 | 0x1);
+insn32($imm_l);
+# IIHF r,imm_high
+insn16(0xc0 << 8 | $r << 4 | 0x8);
+insn32($imm_h);
+}
+
+sub write_mov_fp($$)
+{
+my ($r, $imm) = @_;
+
+write_mov_ri(0, ~$imm, $imm);
+# LDGR
+insn32(0xb3c1 << 16 | $r << 4);
+}
+
+sub write_random_regdata()
+{
+# Floating point registers
+for (my $i = 0; $i < 16; $i++) {
+write_mov_fp($i, rand(0x));
+}
+
+# Load FPC (via r0)
+write_mov_ri(0, 0, (rand(0x) & 0x00fcff77));
+insn32(0xb384);
+
+# general purpose registers
+for (my $i = 0; $i < 16; $i++) {
+write_mov_ri($i, rand(0x), rand(0x));
+}
+}
+
+my $OP_COMPARE = 0;# compare registers
+my $OP_TESTEND = 1;# end of test, stop
+
+sub write_random_register_data()
+{
+write_random_regdata();
+write_risuop($OP_COMPARE);
+}
+
+sub gen_one_insn($$)
+{
+# Given an instruction-details array, generate an instruction
+my $constraintfailures = 0;
+
+INSN: while(1) {
+my ($forcecond, $rec) = @_;
+my $insn = int(rand(0x));
+my $insnname = $rec->{name};
+my $insnwidth = $rec->{width};
+my $fixedbits = $rec->{fixedbits};
+my $fixedbitmask = $rec->{fixedbitmask};
+my $constraint = $rec->{blocks}{"constraints"};
+my $memblock = $rec->{blocks}{"memory"};
+
+$insn &= ~$fixedbitmask;
+$insn |= $fixedbits;
+
+if (defined $constraint) {
+# user-specified constraint: evaluate in an environment
+# with variables set corresponding to the variable fields.
+my $v = eval_with_fields($insnname, $insn, $rec, "constraints", 
$constraint);
+if (!$v) {
+$constraintfailures++;
+if ($constraintfailures > 1) {
+print "1 consecutive constraint failures for $insnname 
constraints string:\n$constraint\n";
+exit (1);
+}
+next INSN;
+}
+}
+
+# OK, we got a good one
+$constraintfailures = 0;
+
+my $basereg;
+
+if (defined $memblock) {
+die "memblock handling has not been implemented yet."
+}
+
+if ($insnwidth == 16) {
+insn16(($insn >> 16) & 0x);
+} else {
+insn32($insn);
+}
+
+return;
+}
+}
+
+sub write_risuop($)
+{
+my ($op) = @_;
+insn32(0x835a0f00 | $op);
+}
+
+sub write_test_code($)
+{
+my ($params) = @_;
+
+my $condprob = $params->{ 'condprob' };
+my $numinsns = $params->{ 'numinsns' };
+my $outfile = $params->{ 'outfile' };
+
+my %insn_details = %{ $params->{ 'details' } };
+my @keys = @{ $params->{ 'keys' } };
+
+set_endian(1);
+
+open_bin($outfile);
+
+# convert from probability that insn will be conditional to
+# probability of forcing insn to unconditional
+$condprob = 1 - $condprob;
+
+# TODO better random number generator?
+srand(0);
+
+print "Generating code using patterns: @keys...\n";
+progress_start(78, $numinsns);
+
+if (grep { defined($insn_details{$_}->{blocks}->{"memory"}) } @keys) {
+write_memblock_setup();
+}
+
+# memblock setup doesn't clean its registers, so this must come afterwards.
+write_random_register_data();
+
+for my $i (1..$numinsns) {
+my $insn_enc = $keys[in

Re: [PATCH] mem/x86: add processor address space check for VM memory

2023-09-14 Thread David Hildenbrand


We requested a to hotplug a maximum of "8 GiB", and sized the area slightly 
larger to allow for some flexibility
when it comes to placing DIMMs in that "device-memory" area.

Right but here in this example you do not hot plug memory while the VM is 
running. We can hot plug 8G yes, but the memory may not physically exist yet 
(and may never exist). How can we use this math to provision device-memory when 
the memory may not exist physically?


We simply reserve a region in GPA space where we can coldplug and hotplug a
predefined maximum amount of memory we can hotplug.

What do you think is wrong with that?


The only issue I have is that even though we are accounting for it, the memory 
actually might not be physically present.


Not sure if "accounting" is the right word; the memory is not present 
and nowhere indicated as present. It's just a reservation of GPA space, 
like the PCI hole is as well.


[...]



Yes. In this case ms->ram_size == ms->maxram_size and you cannot cold/hotplug 
any memory devices.

See how pc_memory_init() doesn't call machine_memory_devices_init() in that 
case.

That's what the QEMU user asked for when *not* specifying maxmem (e.g., -m 4g).

In order to cold/hotplug any memory devices, you have to tell QEMU ahead of 
time how much memory
you are intending to provide using memory devices (DIMM, NVDIMM, virtio-pmem, 
virtio-mem).


So that means that when we are actually hot plugging the memory, there is no 
need to actually perform additional checks. It can be done statically when -mem 
and -maxmem etc are provided in the command line.


What memory device code does, is find a free location inside the 
reserved GPA space for memory devices. Then, it maps that device at

that location.

[...]


/*
* The 64bit pci hole starts after "above 4G RAM" and
* potentially the space reserved for memory hotplug.
*/

There is the
ROUND_UP(hole64_start, 1 * GiB);
in there that is not really required for the !hole64 case. It
shouldn't matter much in practice I think (besides an aligned value
showing up in the error message).

We could factor out most of that calculation into a
separate function, skipping that alignment to make that
clearer.


Yeah this whole memory segmentation is quite complicated and might benefit from 
a qemu doc or a refactoring.


Absolutely. Do you have time to work on that (including the updated fix?).

--
Cheers,

David / dhildenb

Re: [PULL 14/14] ui: add precondition for dpy_get_ui_info()

2023-09-14 Thread Daniel P . Berrangé

On Tue, Sep 12, 2023 at 02:46:48PM +0400, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Ensure that it only get called when dpy_ui_info_supported(). The
> function should always return a result. There should be a non-null
> console or active_console.

Empirically that does not appear to be the case. After this patch,
a no-args QEMU launch immediately aborts:

$ ./build/qemu-system-x86_64 
qemu-system-x86_64: ../ui/console.c:818: dpy_get_ui_info: Assertion 
`dpy_ui_info_supported(con)' failed.
Aborted (core dumped)

This ought to be running the GTK UI for me. Manually ask for
SDL instead and it doesn't crash.

> 
> Modify the argument to be const as well.
> 
> Signed-off-by: Marc-André Lureau 
> Reviewed-by: Albert Esteve 
> ---
>  include/ui/console.h | 2 +-
>  ui/console.c | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/ui/console.h b/include/ui/console.h
> index 79e4702912..28882f15a5 100644
> --- a/include/ui/console.h
> +++ b/include/ui/console.h
> @@ -329,7 +329,7 @@ void update_displaychangelistener(DisplayChangeListener 
> *dcl,
>uint64_t interval);
>  void unregister_displaychangelistener(DisplayChangeListener *dcl);
>  
> -bool dpy_ui_info_supported(QemuConsole *con);
> +bool dpy_ui_info_supported(const QemuConsole *con);
>  const QemuUIInfo *dpy_get_ui_info(const QemuConsole *con);
>  int dpy_set_ui_info(QemuConsole *con, QemuUIInfo *info, bool delay);
>  
> diff --git a/ui/console.c b/ui/console.c
> index aa1e09462c..4a4f19ed33 100644
> --- a/ui/console.c
> +++ b/ui/console.c
> @@ -801,7 +801,7 @@ static void dpy_set_ui_info_timer(void *opaque)
>  con->hw_ops->ui_info(con->hw, head, &con->ui_info);
>  }
>  
> -bool dpy_ui_info_supported(QemuConsole *con)
> +bool dpy_ui_info_supported(const QemuConsole *con)
>  {
>  if (con == NULL) {
>  con = active_console;
> @@ -815,6 +815,8 @@ bool dpy_ui_info_supported(QemuConsole *con)
>  
>  const QemuUIInfo *dpy_get_ui_info(const QemuConsole *con)
>  {
> +assert(dpy_ui_info_supported(con));
> +
>  if (con == NULL) {
>  con = active_console;
>  }
> -- 
> 2.41.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 0/3] docs: update x86 CPU model ABI matrix docs

2023-09-14 Thread Daniel P . Berrangé

Ping for review please. This series still applies to git master.

On Tue, Jul 18, 2023 at 10:26:28AM +0100, Daniel P. Berrangé wrote:
> Changed in v2:
> 
>  - Tweaked commit messages
>  - Also add GraniteRapids CPU model
> 
> Daniel P. Berrangé (3):
>   scripts: drop comment about autogenerated CPU API file
>   docs: fix highlighting of CPU ABI header rows
>   docs: re-generate x86_64 ABI compatibility CSV
> 
>  docs/system/cpu-models-x86-abi.csv | 20 ++--
>  docs/system/cpu-models-x86.rst.inc |  2 +-
>  scripts/cpu-x86-uarch-abi.py   |  1 -
>  3 files changed, 19 insertions(+), 4 deletions(-)
> 
> -- 
> 2.41.0
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 11/11] qdev: Rework array properties based on list visitor

2023-09-14 Thread Kevin Wolf

Am 14.09.2023 um 12:24 hat Peter Maydell geschrieben:
> On Fri, 8 Sept 2023 at 15:37, Kevin Wolf  wrote:
> >
> > Until now, array properties are actually implemented with a hack that
> > uses multiple properties on the QOM level: a static "foo-len" property
> > and after it is set, dynamically created "foo[i]" properties.
> >
> > In external interfaces (-device on the command line and device_add in
> > QMP), this interface was broken by commit f3558b1b ('qdev: Base object
> > creation on QDict rather than QemuOpts') because QDicts are unordered
> > and therefore it could happen that QEMU tried to set the indexed
> > properties before setting the length, which fails and effectively makes
> > array properties inaccessible. In particular, this affects the 'ports'
> > property of the 'rocker' device.
> >
> > This patch reworks the external interface so that instead of using a
> > separate top-level property for the length and for each element, we use
> > a single true array property that accepts a list value. In the external
> > interfaces, this is naturally expressed as a JSON list and makes array
> > properties accessible again.
> >
> > Creating an array property on the command line without using JSON format
> > is currently not possible. This could be fixed by switching from
> > QemuOpts to a keyval parser, which however requires consideration of the
> > compatibility implications.
> >
> > All internal users of devices with array properties go through
> > qdev_prop_set_array() at this point, so updating it takes care of all of
> > them.
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1090
> > Fixes: f3558b1b763683bb877f7dd5b282469cdadc65c3
> > Signed-off-by: Kevin Wolf 
> 
> I'm hoping that somebody who understands the visitor APIs
> better than me will have a look at this patch, but in the
> meantime, here's my review, which I suspect has a lot of
> comments that mostly reflect that I don't really understand
> the visitor stuff...

I discussed the visitor aspects with Markus before sending the series,
so I think he agrees with the approach. But I wouldn't mind an explicit
Reviewed-by, of course.

> > ---
> >  include/hw/qdev-properties.h |  23 ++--
> >  hw/core/qdev-properties-system.c |   2 +-
> >  hw/core/qdev-properties.c| 204 +++
> >  3 files changed, 133 insertions(+), 96 deletions(-)
> >
> > diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
> > index 7fa2fdb7c9..9370b36b72 100644
> > --- a/include/hw/qdev-properties.h
> > +++ b/include/hw/qdev-properties.h
> > @@ -61,7 +61,7 @@ extern const PropertyInfo qdev_prop_size;
> >  extern const PropertyInfo qdev_prop_string;
> >  extern const PropertyInfo qdev_prop_on_off_auto;
> >  extern const PropertyInfo qdev_prop_size32;
> > -extern const PropertyInfo qdev_prop_arraylen;
> > +extern const PropertyInfo qdev_prop_array;
> >  extern const PropertyInfo qdev_prop_link;
> >
> >  #define DEFINE_PROP(_name, _state, _field, _prop, _type, ...) {  \
> > @@ -115,8 +115,6 @@ extern const PropertyInfo qdev_prop_link;
> >  .bitmask= (_bitmask), \
> >  .set_default = false)
> >
> > -#define PROP_ARRAY_LEN_PREFIX "len-"
> > -
> >  /**
> >   * DEFINE_PROP_ARRAY:
> >   * @_name: name of the array
> > @@ -127,24 +125,21 @@ extern const PropertyInfo qdev_prop_link;
> >   * @_arrayprop: PropertyInfo defining what property the array elements have
> >   * @_arraytype: C type of the array elements
> >   *
> > - * Define device properties for a variable-length array _name.  A
> > - * static property "len-arrayname" is defined. When the device creator
> > - * sets this property to the desired length of array, further dynamic
> > - * properties "arrayname[0]", "arrayname[1]", ...  are defined so the
> > - * device creator can set the array element values. Setting the
> > - * "len-arrayname" property more than once is an error.
> > + * Define device properties for a variable-length array _name.  The array 
> > is
> > + * represented as a list in the visitor interface.
> > + *
> > + * @_arraytype is required to be movable with memcpy().
> >   *
> > - * When the array length is set, the @_field member of the device
> > + * When the array property is set, the @_field member of the device
> >   * struct is set to the array length, and @_arrayfield is set to point
> > - * to (zero-initialised) memory allocated for the array.  For a zero
> > - * length array, @_field will be set to 0 and @_arrayfield to NULL.
> > + * to the memory allocated for the array.
> > + *
> >   * It is the responsibility of the device deinit code to free the
> >   * @_arrayfield memory.
> >   */
> >  #define DEFINE_PROP_ARRAY(_name, _state, _field,   \
> >_arrayfield, _arrayprop, _arraytype) \
> > -DEFINE_PROP((PROP_ARRAY_LEN_PREFIX _name), \
> > -_state, _field, qdev_prop_arraylen, uint32_t,  \
> > +D

Re: [PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test

2023-09-14 Thread Stefan Hajnoczi

On Wed, Sep 13, 2023 at 11:08:54AM -0500, Eric Blake wrote:
> On Tue, Sep 12, 2023 at 07:10:34PM -0400, Stefan Hajnoczi wrote:
> > This patch fixes a race condition in test-bdrv-drain that is difficult
> > to reproduce. test-bdrv-drain sometimes fails without an error message
> > on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able
> > to reproduce it locally and found that "block-backend: process I/O in
> > the current AioContext" (in this patch series) is the first commit where
> > it reproduces.
> > 
> > I do not know why "block-backend: process I/O in the current AioContext"
> > exposes this bug. It might be related to the fact that the test's preadv
> > request runs in the main thread instead of IOThread a after my commit.
> 
> In reading the commit message before the impacted code, my first
> thought was that you had a typo of an extra word (that is, something
> to fix by s/a //), but reading further, a better fix would be calling
> attention to the fact that you are referencing a specific named
> thread, as in s/IOThread a/IOThread A/...
> 
> > That might simply change the timing of the test.
> > 
> > Now on to the race condition in test-bdrv-drain. The main thread
> > schedules a BH in IOThread a and then drains the BDS:
> 
> ...and another spot with the same parse issue...
> 
> > 
> >   aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
> > 
> >   /* The request is running on the IOThread a. Draining its block device
> 
> ...but here you were quoting from the existing code base, which is
> where I finally realized it was more than just your commit message.
> 
> >* will make sure that it has completed as far as the BDS is concerned,
> >* but the drain in this thread can continue immediately after
> >* bdrv_dec_in_flight() and aio_ret might be assigned only slightly
> >* later. */
> >   do_drain_begin(drain_type, bs);
> > 
> > If the BH completes before do_drain_begin() then there is nothing to
> > worry about.
> > 
> > If the BH invokes bdrv_flush() before do_drain_begin(), then
> > do_drain_begin() waits for it to complete.
> > 
> > The problematic case is when do_drain_begin() runs before the BH enters
> > bdrv_flush(). Then do_drain_begin() misses the BH and the drain
> > mechanism has failed in quiescing I/O.
> > 
> > Fix this by incrementing the in_flight counter so that do_drain_begin()
> > waits for test_iothread_main_thread_bh().
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  tests/unit/test-bdrv-drain.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
> > index ccc453c29e..67a79aa3f0 100644
> > --- a/tests/unit/test-bdrv-drain.c
> > +++ b/tests/unit/test-bdrv-drain.c
> > @@ -512,6 +512,7 @@ static void test_iothread_main_thread_bh(void *opaque)
> >   * executed during drain, otherwise this would deadlock. */
> >  aio_context_acquire(bdrv_get_aio_context(data->bs));
> >  bdrv_flush(data->bs);
> > +bdrv_dec_in_flight(data->bs); /* incremented by test_iothread_common() 
> > */
> >  aio_context_release(bdrv_get_aio_context(data->bs));
> >  }
> >  
> > @@ -583,6 +584,13 @@ static void test_iothread_common(enum drain_type 
> > drain_type, int drain_thread)
> >  aio_context_acquire(ctx_a);
> >  }
> >  
> > +/*
> > + * Increment in_flight so that do_drain_begin() waits for
> > + * test_iothread_main_thread_bh(). This prevents the race between
> > + * test_iothread_main_thread_bh() in IOThread a and 
> > do_drain_begin() in
> > + * this thread. test_iothread_main_thread_bh() decrements 
> > in_flight.
> > + */
> > +bdrv_inc_in_flight(bs);
> >  aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, 
> > &data);
> >  
> >  /* The request is running on the IOThread a. Draining its block 
> > device
> 
> and indeed, your commit message is consistent with the current code's
> naming convention.  If you have reason to respin, a pre-req patch to
> change the case before adding more references might be nice, but I
> won't insist.
> 
> Reviewed-by: Eric Blake 

Sorry about that. It is confusing.

Stefan


signature.asc
Description: PGP signature

[RFC PATCH 0/3] Refactor PPI logic/definitions for virt/sbsa-ref

2023-09-14 Thread Leif Lindholm

While reviewing Marcin's patch this morning, cross referencing different
specifications and looking at various places around the source code in
order to convinced myself he really hadn't missed something out (the
existing plumbing made it *so* clean to add), my brain broke slightly
at keeping track of PPIs/INTIDs between the various sources.

Moreover, I found the PPI() macro in virt.h to be doing the exact
opposite of what I would have expected it to (it converts a PPI to an
INTID rather than the other way around).

So I refactored stuff so that:
- PPIs defined by BSA are moved to a (new) common header.
- The _IRQ definitions for those PPIs refer to the INTIDs.
- sbsa-ref and virt both use these definitions.

This change does objectively add a bit more noise to the code, since it
means more locations need to use the PPI macro than before, but it felt
like a readability improvement to me.

Not even compilation tested, just the least confusing way of asking
whether the change could be accepted at all.

Leif Lindholm (3):
  include/hw/arm: move BSA definitions to bsa.h
  {include/}hw/arm: refactor BSA/virt PPI logic
  hw/arm/sbsa-ref: use bsa.h for PPI definitions

 hw/arm/sbsa-ref.c| 24 +++-
 hw/arm/virt-acpi-build.c |  4 ++--
 hw/arm/virt.c|  9 +
 include/hw/arm/bsa.h | 35 +++
 include/hw/arm/virt.h| 12 +---
 5 files changed, 54 insertions(+), 30 deletions(-)
 create mode 100644 include/hw/arm/bsa.h

-- 
2.30.2

[RFC PATCH 3/3] hw/arm/sbsa-ref: use bsa.h for PPI definitions

2023-09-14 Thread Leif Lindholm

Use the private peripheral interrupt definitions from bsa.h instead of
defining them locally. Refactor to use PPI() to convert from INTID macro
where necessary.

Signed-off-by: Leif Lindholm 
---
 hw/arm/sbsa-ref.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index bc89eb4806..589b17e3bc 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -2,6 +2,7 @@
  * ARM SBSA Reference Platform emulation
  *
  * Copyright (c) 2018 Linaro Limited
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
  * Written by Hongbo Zhang 
  *
  * This program is free software; you can redistribute it and/or modify it
@@ -30,6 +31,7 @@
 #include "exec/hwaddr.h"
 #include "kvm_arm.h"
 #include "hw/arm/boot.h"
+#include "hw/arm/bsa.h"
 #include "hw/arm/fdt.h"
 #include "hw/arm/smmuv3.h"
 #include "hw/block/flash.h"
@@ -55,13 +57,6 @@
 #define NUM_SMMU_IRQS   4
 #define NUM_SATA_PORTS  6
 
-#define VIRTUAL_PMU_IRQ7
-#define ARCH_GIC_MAINT_IRQ 9
-#define ARCH_TIMER_VIRT_IRQ11
-#define ARCH_TIMER_S_EL1_IRQ   13
-#define ARCH_TIMER_NS_EL1_IRQ  14
-#define ARCH_TIMER_NS_EL2_IRQ  10
-
 enum {
 SBSA_FLASH,
 SBSA_MEM,
@@ -494,15 +489,18 @@ static void create_gic(SBSAMachineState *sms, 
MemoryRegion *mem)
 for (irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
 qdev_connect_gpio_out(cpudev, irq,
   qdev_get_gpio_in(sms->gic,
-   ppibase + timer_irq[irq]));
+   ppibase
+   + PPI(timer_irq[irq])));
 }
 
+irq = qdev_get_gpio_in(sms->gic,
+   ppibase + PPI(ARCH_GIC_MAINT_IRQ));
 qdev_connect_gpio_out_named(cpudev, "gicv3-maintenance-interrupt", 0,
-qdev_get_gpio_in(sms->gic, ppibase
- + ARCH_GIC_MAINT_IRQ));
-qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0,
-qdev_get_gpio_in(sms->gic, ppibase
- + VIRTUAL_PMU_IRQ));
+irq);
+
+irq = qdev_get_gpio_in(sms->gic,
+   ppibase + PPI(VIRTUAL_PMU_IRQ));
+qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0, irq);
 
 sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, 
ARM_CPU_IRQ));
 sysbus_connect_irq(gicbusdev, i + smp_cpus,
-- 
2.30.2

[RFC PATCH 1/3] include/hw/arm: move BSA definitions to bsa.h

2023-09-14 Thread Leif Lindholm

virt.h defines a number of IRQs that are ultimately described by Arm's
Base System Architecture specification. Move these to a dedicated header
so that they can be reused by other platforms that do the same.
Include that header from virt.h to minimise churn.

Signed-off-by: Leif Lindholm 
---
 include/hw/arm/bsa.h  | 35 +++
 include/hw/arm/virt.h | 12 +---
 2 files changed, 36 insertions(+), 11 deletions(-)
 create mode 100644 include/hw/arm/bsa.h

diff --git a/include/hw/arm/bsa.h b/include/hw/arm/bsa.h
new file mode 100644
index 00..8277b3a379
--- /dev/null
+++ b/include/hw/arm/bsa.h
@@ -0,0 +1,35 @@
+/*
+ * Common definitions for Arm Base System Architecture (BSA) platforms.
+ *
+ * Copyright (c) 2015 Linaro Limited
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ *
+ */
+
+#ifndef QEMU_ARM_BSA_H
+#define QEMU_ARM_BSA_H
+
+#define ARCH_GIC_MAINT_IRQ  9
+
+#define ARCH_TIMER_VIRT_IRQ   11
+#define ARCH_TIMER_S_EL1_IRQ  13
+#define ARCH_TIMER_NS_EL1_IRQ 14
+#define ARCH_TIMER_NS_EL2_IRQ 10
+
+#define VIRTUAL_PMU_IRQ 7
+
+#define PPI(irq) ((irq) + 16)
+
+#endif /* QEMU_ARM_BSA_H */
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index e1ddbea96b..f69239850e 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -34,6 +34,7 @@
 #include "qemu/notify.h"
 #include "hw/boards.h"
 #include "hw/arm/boot.h"
+#include "hw/arm/bsa.h"
 #include "hw/block/flash.h"
 #include "sysemu/kvm.h"
 #include "hw/intc/arm_gicv3_common.h"
@@ -43,17 +44,6 @@
 #define NUM_VIRTIO_TRANSPORTS 32
 #define NUM_SMMU_IRQS  4
 
-#define ARCH_GIC_MAINT_IRQ  9
-
-#define ARCH_TIMER_VIRT_IRQ   11
-#define ARCH_TIMER_S_EL1_IRQ  13
-#define ARCH_TIMER_NS_EL1_IRQ 14
-#define ARCH_TIMER_NS_EL2_IRQ 10
-
-#define VIRTUAL_PMU_IRQ 7
-
-#define PPI(irq) ((irq) + 16)
-
 /* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */
 #define PVTIME_SIZE_PER_CPU 64
 
-- 
2.30.2

[RFC PATCH 2/3] {include/}hw/arm: refactor BSA/virt PPI logic

2023-09-14 Thread Leif Lindholm

GIC Private Peripheral Interrupts (PPI) are defined as GIC INTID 16-31.
As in, PPI0 is INTID16 .. PPI15 is INTID31.
Arm's Base System Architecture specification (BSA) lists the mandated and
recommended private interrupt IDs by INTID, not by PPI index. But current
definitions in qemu define them by PPI index, complicating cross
referencing.

Meanwhile, the PPI(x) macro counterintuitively adds 16 to the input value,
converting a PPI index to an INTID.

Resolve this by redefining the BSA-allocated PPIs by their INTIDs,
inverting the logic of the PPI(x) macro and flipping where it is used.

Signed-off-by: Leif Lindholm 
---
 hw/arm/virt-acpi-build.c |  4 ++--
 hw/arm/virt.c|  9 +
 include/hw/arm/bsa.h | 14 +++---
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 6b674231c2..963c58a88a 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -729,9 +729,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
 uint64_t physical_base_address = 0, gich = 0, gicv = 0;
-uint32_t vgic_interrupt = vms->virt ? PPI(ARCH_GIC_MAINT_IRQ) : 0;
+uint32_t vgic_interrupt = vms->virt ? ARCH_GIC_MAINT_IRQ : 0;
 uint32_t pmu_interrupt = arm_feature(&armcpu->env, ARM_FEATURE_PMU) ?
- PPI(VIRTUAL_PMU_IRQ) : 0;
+ VIRTUAL_PMU_IRQ : 0;
 
 if (vms->gic_version == VIRT_GIC_VERSION_2) {
 physical_base_address = memmap[VIRT_GIC_CPU].base;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8ad78b23c2..bb70f3eec8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -815,23 +815,24 @@ static void create_gic(VirtMachineState *vms, 
MemoryRegion *mem)
 for (irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
 qdev_connect_gpio_out(cpudev, irq,
   qdev_get_gpio_in(vms->gic,
-   ppibase + timer_irq[irq]));
+   ppibase
+   + PPI(timer_irq[irq])));
 }
 
 if (vms->gic_version != VIRT_GIC_VERSION_2) {
 qemu_irq irq = qdev_get_gpio_in(vms->gic,
-ppibase + ARCH_GIC_MAINT_IRQ);
+ppibase + PPI(ARCH_GIC_MAINT_IRQ));
 qdev_connect_gpio_out_named(cpudev, "gicv3-maintenance-interrupt",
 0, irq);
 } else if (vms->virt) {
 qemu_irq irq = qdev_get_gpio_in(vms->gic,
-ppibase + ARCH_GIC_MAINT_IRQ);
+ppibase + PPI(ARCH_GIC_MAINT_IRQ));
 sysbus_connect_irq(gicbusdev, i + 4 * smp_cpus, irq);
 }
 
 qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0,
 qdev_get_gpio_in(vms->gic, ppibase
- + VIRTUAL_PMU_IRQ));
+ + PPI(VIRTUAL_PMU_IRQ)));
 
 sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, 
ARM_CPU_IRQ));
 sysbus_connect_irq(gicbusdev, i + smp_cpus,
diff --git a/include/hw/arm/bsa.h b/include/hw/arm/bsa.h
index 8277b3a379..b7db1cacf1 100644
--- a/include/hw/arm/bsa.h
+++ b/include/hw/arm/bsa.h
@@ -21,15 +21,15 @@
 #ifndef QEMU_ARM_BSA_H
 #define QEMU_ARM_BSA_H
 
-#define ARCH_GIC_MAINT_IRQ  9
+#define ARCH_GIC_MAINT_IRQ  25
 
-#define ARCH_TIMER_VIRT_IRQ   11
-#define ARCH_TIMER_S_EL1_IRQ  13
-#define ARCH_TIMER_NS_EL1_IRQ 14
-#define ARCH_TIMER_NS_EL2_IRQ 10
+#define ARCH_TIMER_VIRT_IRQ   27
+#define ARCH_TIMER_S_EL1_IRQ  29
+#define ARCH_TIMER_NS_EL1_IRQ 30
+#define ARCH_TIMER_NS_EL2_IRQ 26
 
-#define VIRTUAL_PMU_IRQ 7
+#define VIRTUAL_PMU_IRQ 23
 
-#define PPI(irq) ((irq) + 16)
+#define PPI(irq) ((irq) - 16)
 
 #endif /* QEMU_ARM_BSA_H */
-- 
2.30.2

1 2 3 4 >

1 - 100 of 360 matches

Mail list logo