Re: [PULL 0/6] ppc queue

2023-05-06 Thread Richard Henderson

On 5/5/23 17:34, Daniel Henrique Barboza wrote:

The following changes since commit a9fe9e191b4305b88c356a1ed9ac3baf89eb18aa:

   Merge tag 'pull-riscv-to-apply-20230505-1' 
ofhttps://github.com/alistair23/qemu  into staging (2023-05-05 09:25:13 +0100)

are available in the Git repository at:

   https://gitlab.com/danielhb/qemu.git  tags/pull-ppc-20230505

for you to fetch changes up to b35261b1a6c2729fa7e7a6ca34b9489eda62b744:

   hw/ppc/Kconfig: NVDIMM is a hard requirement for the pseries machine 
(2023-05-05 12:34:22 -0300)


ppc patch queue for 2023-05-05:

This queue includes fixes for ppc and spapr emulation, a build fix for
the pseries machine and a new reviewer for ppc/spapr.

We're also carrying a Coverity fix for the sm501 display.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




Re: [PULL 00/42] tcg patch queue

2023-05-06 Thread Richard Henderson

On 5/5/23 22:24, Richard Henderson wrote:

The following changes since commit a9fe9e191b4305b88c356a1ed9ac3baf89eb18aa:

   Merge tag 'pull-riscv-to-apply-20230505-1' 
ofhttps://github.com/alistair23/qemu  into staging (2023-05-05 09:25:13 +0100)

are available in the Git repository at:

   https://gitlab.com/rth7680/qemu.git  tags/pull-tcg-20230505

for you to fetch changes up to 35a0bd63b458f30389b6bc6b7471c1665fe7b9d8:

   tcg: Widen helper_*_st[bw]_mmu val arguments (2023-05-05 17:21:03 +0100)


softfloat: Fix the incorrect computation in float32_exp2
tcg: Remove compatability helpers for qemu ld/st
target/alpha: Remove TARGET_ALIGNED_ONLY
target/hppa: Remove TARGET_ALIGNED_ONLY
target/sparc: Remove TARGET_ALIGNED_ONLY
tcg: Cleanups preparing to unify calls to qemu_ld/st helpers


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




[PATCH] docs/devel: remind developers to run CI container pipeline when updating images

2023-05-06 Thread Ani Sinha
When new dependencies and packages are added to containers, its important to
run CI container generation pipelines on gitlab to make sure that there are no
obvious conflicts between packages that are being added and those that are
already present. Running CI container pipelines will make sure that there are
no such breakages before we commit the change updating the containers. Add a
line in the documentation reminding developers to run the pipeline before
submitting the change. It will also ease the life of the maintainers.

Signed-off-by: Ani Sinha 
---
 docs/devel/testing.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 4071e72710..203facb417 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -479,6 +479,12 @@ first to contribute the mapping to the ``libvirt-ci`` 
project:
contains the ``mappings.yml`` update.  Then add the prerequisite and
run ``make lcitool-refresh``.
 
+ * Please also trigger gitlab container generation pipelines on your change
+   for as many OS distros as practical to make sure that there are no
+   obvious breakages when adding the new pre-requisite. Please see
+   `CI `__ documentation
+   page on how to trigger gitlab CI pipelines on your change.
+
 For enterprise distros that default to old, end-of-life versions of the
 Python runtime, QEMU uses a separate set of mappings that work with more
 recent versions.  These can be found in ``tests/lcitool/mappings.yml``.
-- 
2.31.1




Re: [PATCH v3 05/10] qapi: make the vcpu parameters deprecated for 8.1

2023-05-06 Thread Markus Armbruster
Alex Bennée  writes:

> I don't think I can remove the parameters directly but certainly mark
> them as deprecated.
>
> Message-Id: <20230420150009.1675181-6-alex.ben...@linaro.org>
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Richard Henderson 
> Signed-off-by: Alex Bennée 
> Message-Id: <20230503091756.1453057-6-alex.ben...@linaro.org>
> ---
>  qapi/trace.json | 22 +++---
>  1 file changed, 7 insertions(+), 15 deletions(-)
>
> diff --git a/qapi/trace.json b/qapi/trace.json
> index f425d10764..de6b1681aa 100644
> --- a/qapi/trace.json
> +++ b/qapi/trace.json
> @@ -33,9 +33,9 @@
>  #
>  # @name: Event name.
>  # @state: Tracing state.
> -# @vcpu: Whether this is a per-vCPU event (since 2.7).
> +# @vcpu: Whether this is a per-vCPU event (deprecated since 8.1).

We don't normally replace the (since ...) when we deprecate.

>  #
> -# An event is per-vCPU if it has the "vcpu" property in the "trace-events"
> +# There are no longer any events with the "vcpu" property in the 
> "trace-events"

Why would a user still need to know what @vcpu used to mean?  Also, long
line.  See below for a possible alternative.

>  # files.
>  #
>  # Since: 2.2

You need to make it official, like so:

   { 'struct': 'TraceEventInfo',
  -  'data': {'name': 'str', 'state': 'TraceEventState', 'vcpu': 'bool'} }
  +  'data': {'name': 'str', 'state': 'TraceEventState',
  +   'vcpu': { 'type': 'bool', 'features': ['deprecated'] } } }

And then the generator will demand you document it formally, so you also
need something like

 # @state: Tracing state.
 # @vcpu: Whether this is a per-vCPU event (since 2.7).
 #
-# An event is per-vCPU if it has the "vcpu" property in the "trace-events"
-# files.
+# Features:
+# @deprecated: Member @vcpu is deprecated, and always false.
 #
 # Since: 2.2
 ##

Additionally, update docs/about/deprecated.rst.

> @@ -49,19 +49,15 @@
>  # Query the state of events.
>  #
>  # @name: Event name pattern (case-sensitive glob).
> -# @vcpu: The vCPU to query (any by default; since 2.7).
> +# @vcpu: The vCPU to query (deprecated since 8.1).

Again, we don't normally replace the (since ...) when we deprecate.

I suggest to just drop the "any by default" part.

>  #
>  # Returns: a list of @TraceEventInfo for the matching events
>  #
>  #  An event is returned if:
>  #
>  #  - its name matches the @name pattern, and
> -#  - if @vcpu is given, the event has the "vcpu" property.
>  #
> -#  Therefore, if @vcpu is given, the operation will only match 
> per-vCPU events,
> -#  returning their state on the specified vCPU. Special case: if 
> @name is an
> -#  exact match, @vcpu is given and the event does not have the 
> "vcpu" property,
> -#  an error is returned.
> +#  There are no longer any per-vCPU events
>  #
>  # Since: 2.2
>  #

Please add 'features': ['deprecated'].

> @@ -84,17 +80,13 @@
>  # @name: Event name pattern (case-sensitive glob).
>  # @enable: Whether to enable tracing.
>  # @ignore-unavailable: Do not match unavailable events with @name.
> -# @vcpu: The vCPU to act upon (all by default; since 2.7).
> +# @vcpu: The vCPU to act upon (deprecated since 8.1).

Suggest to just drop the "all by default" part.

>  #
>  # An event's state is modified if:
>  #
> -# - its name matches the @name pattern, and
> -# - if @vcpu is given, the event has the "vcpu" property.
> +# - its name matches the @name pattern
>  #
> -# Therefore, if @vcpu is given, the operation will only match per-vCPU 
> events,
> -# setting their state on the specified vCPU. Special case: if @name is an 
> exact
> -# match, @vcpu is given and the event does not have the "vcpu" property, an
> -# error is returned.
> +# There are no longer and per-vCPU events so specifying it will never match.
>  #
>  # Since: 2.2
>  #

Please add 'features': ['deprecated'].




[PATCH v5 18/30] tcg/riscv: Convert tcg_out_qemu_{ld,st}_slow_path

2023-05-06 Thread Richard Henderson
Use tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target.c.inc | 37 ++---
 1 file changed, 10 insertions(+), 27 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 2b2d313fe2..c22d1e35ac 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -906,14 +906,14 @@ static void tcg_out_goto(TCGContext *s, const 
tcg_insn_unit *target)
 tcg_debug_assert(ok);
 }
 
+/* We have three temps, we might as well expose them. */
+static const TCGLdstHelperParam ldst_helper_param = {
+.ntmp = 3, .tmp = { TCG_REG_TMP0, TCG_REG_TMP1, TCG_REG_TMP2 }
+};
+
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-MemOpIdx oi = l->oi;
-MemOp opc = get_memop(oi);
-TCGReg a0 = tcg_target_call_iarg_regs[0];
-TCGReg a1 = tcg_target_call_iarg_regs[1];
-TCGReg a2 = tcg_target_call_iarg_regs[2];
-TCGReg a3 = tcg_target_call_iarg_regs[3];
+MemOp opc = get_memop(l->oi);
 
 /* resolve label address */
 if (!reloc_sbimm12(l->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
@@ -921,13 +921,9 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 }
 
 /* call load helper */
-tcg_out_mov(s, TCG_TYPE_PTR, a0, TCG_AREG0);
-tcg_out_mov(s, TCG_TYPE_PTR, a1, l->addrlo_reg);
-tcg_out_movi(s, TCG_TYPE_PTR, a2, oi);
-tcg_out_movi(s, TCG_TYPE_PTR, a3, (tcg_target_long)l->raddr);
-
+tcg_out_ld_helper_args(s, l, &ldst_helper_param);
 tcg_out_call_int(s, qemu_ld_helpers[opc & MO_SSIZE], false);
-tcg_out_mov(s, (opc & MO_SIZE) == MO_64, l->datalo_reg, a0);
+tcg_out_ld_helper_ret(s, l, true, &ldst_helper_param);
 
 tcg_out_goto(s, l->raddr);
 return true;
@@ -935,14 +931,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 
 static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-MemOpIdx oi = l->oi;
-MemOp opc = get_memop(oi);
-MemOp s_bits = opc & MO_SIZE;
-TCGReg a0 = tcg_target_call_iarg_regs[0];
-TCGReg a1 = tcg_target_call_iarg_regs[1];
-TCGReg a2 = tcg_target_call_iarg_regs[2];
-TCGReg a3 = tcg_target_call_iarg_regs[3];
-TCGReg a4 = tcg_target_call_iarg_regs[4];
+MemOp opc = get_memop(l->oi);
 
 /* resolve label address */
 if (!reloc_sbimm12(l->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
@@ -950,13 +939,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 }
 
 /* call store helper */
-tcg_out_mov(s, TCG_TYPE_PTR, a0, TCG_AREG0);
-tcg_out_mov(s, TCG_TYPE_PTR, a1, l->addrlo_reg);
-tcg_out_movext(s, s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32, a2,
-   l->type, s_bits, l->datalo_reg);
-tcg_out_movi(s, TCG_TYPE_PTR, a3, oi);
-tcg_out_movi(s, TCG_TYPE_PTR, a4, (tcg_target_long)l->raddr);
-
+tcg_out_st_helper_args(s, l, &ldst_helper_param);
 tcg_out_call_int(s, qemu_st_helpers[opc & MO_SIZE], false);
 
 tcg_out_goto(s, l->raddr);
-- 
2.34.1




[PATCH v5 05/30] tcg/loongarch64: Introduce prepare_host_addr

2023-05-06 Thread Richard Henderson
Merge tcg_out_tlb_load, add_qemu_ldst_label, tcg_out_test_alignment,
tcg_out_zext_addr_if_32_bit, and some code that lived in both
tcg_out_qemu_ld and tcg_out_qemu_st into one function that returns
HostAddress and TCGLabelQemuLdst structures.

Signed-off-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 255 +--
 1 file changed, 105 insertions(+), 150 deletions(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 6a87a5e5a3..2f2c34b930 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -818,81 +818,12 @@ static void * const qemu_st_helpers[4] = {
 [MO_64] = helper_le_stq_mmu,
 };
 
-/* We expect to use a 12-bit negative offset from ENV.  */
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -(1 << 11));
-
 static bool tcg_out_goto(TCGContext *s, const tcg_insn_unit *target)
 {
 tcg_out_opc_b(s, 0);
 return reloc_br_sd10k16(s->code_ptr - 1, target);
 }
 
-/*
- * Emits common code for TLB addend lookup, that eventually loads the
- * addend in TCG_REG_TMP2.
- */
-static void tcg_out_tlb_load(TCGContext *s, TCGReg addrl, MemOpIdx oi,
- tcg_insn_unit **label_ptr, bool is_load)
-{
-MemOp opc = get_memop(oi);
-unsigned s_bits = opc & MO_SIZE;
-unsigned a_bits = get_alignment_bits(opc);
-tcg_target_long compare_mask;
-int mem_index = get_mmuidx(oi);
-int fast_ofs = TLB_MASK_TABLE_OFS(mem_index);
-int mask_ofs = fast_ofs + offsetof(CPUTLBDescFast, mask);
-int table_ofs = fast_ofs + offsetof(CPUTLBDescFast, table);
-
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP0, TCG_AREG0, mask_ofs);
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, TCG_AREG0, table_ofs);
-
-tcg_out_opc_srli_d(s, TCG_REG_TMP2, addrl,
-TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-tcg_out_opc_and(s, TCG_REG_TMP2, TCG_REG_TMP2, TCG_REG_TMP0);
-tcg_out_opc_add_d(s, TCG_REG_TMP2, TCG_REG_TMP2, TCG_REG_TMP1);
-
-/* Load the tlb comparator and the addend.  */
-tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP0, TCG_REG_TMP2,
-   is_load ? offsetof(CPUTLBEntry, addr_read)
-   : offsetof(CPUTLBEntry, addr_write));
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP2, TCG_REG_TMP2,
-   offsetof(CPUTLBEntry, addend));
-
-/* We don't support unaligned accesses.  */
-if (a_bits < s_bits) {
-a_bits = s_bits;
-}
-/* Clear the non-page, non-alignment bits from the address.  */
-compare_mask = (tcg_target_long)TARGET_PAGE_MASK | ((1 << a_bits) - 1);
-tcg_out_movi(s, TCG_TYPE_TL, TCG_REG_TMP1, compare_mask);
-tcg_out_opc_and(s, TCG_REG_TMP1, TCG_REG_TMP1, addrl);
-
-/* Compare masked address with the TLB entry.  */
-label_ptr[0] = s->code_ptr;
-tcg_out_opc_bne(s, TCG_REG_TMP0, TCG_REG_TMP1, 0);
-
-/* TLB Hit - addend in TCG_REG_TMP2, ready for use.  */
-}
-
-static void add_qemu_ldst_label(TCGContext *s, int is_ld, MemOpIdx oi,
-TCGType type,
-TCGReg datalo, TCGReg addrlo,
-void *raddr, tcg_insn_unit **label_ptr)
-{
-TCGLabelQemuLdst *label = new_ldst_label(s);
-
-label->is_ld = is_ld;
-label->oi = oi;
-label->type = type;
-label->datalo_reg = datalo;
-label->datahi_reg = 0; /* unused */
-label->addrlo_reg = addrlo;
-label->addrhi_reg = 0; /* unused */
-label->raddr = tcg_splitwx_to_rx(raddr);
-label->label_ptr[0] = label_ptr[0];
-}
-
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
 MemOpIdx oi = l->oi;
@@ -941,33 +872,6 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 return tcg_out_goto(s, l->raddr);
 }
 #else
-
-/*
- * Alignment helpers for user-mode emulation
- */
-
-static void tcg_out_test_alignment(TCGContext *s, bool is_ld, TCGReg addr_reg,
-   unsigned a_bits)
-{
-TCGLabelQemuLdst *l = new_ldst_label(s);
-
-l->is_ld = is_ld;
-l->addrlo_reg = addr_reg;
-
-/*
- * Without micro-architecture details, we don't know which of bstrpick or
- * andi is faster, so use bstrpick as it's not constrained by imm field
- * width. (Not to say alignments >= 2^12 are going to happen any time
- * soon, though)
- */
-tcg_out_opc_bstrpick_d(s, TCG_REG_TMP1, addr_reg, 0, a_bits - 1);
-
-l->label_ptr[0] = s->code_ptr;
-tcg_out_opc_bne(s, TCG_REG_TMP1, TCG_REG_ZERO, 0);
-
-l->raddr = tcg_splitwx_to_rx(s->code_ptr);
-}
-
 static bool tcg_out_fail_alignment(TCGContext *s, TCGLabelQemuLdst *l)
 {
 /* resolve label address */
@@ -997,27 +901,102 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 
 #endif /* CONFIG_SOFTMMU */
 
-/*
- * `ext32u` the address register into the temp register given,
- * if target is 32-bit, no-op otherwise.
- *
- * Returns

[PATCH v5 23/30] tcg/mips: Simplify constraints on qemu_ld/st

2023-05-06 Thread Richard Henderson
The softmmu tlb uses TCG_REG_TMP[0-3], not any of the normally available
registers.  Now that we handle overlap betwen inputs and helper arguments,
and have eliminated use of A0, we can allow any allocatable reg.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target-con-set.h | 13 +
 tcg/mips/tcg-target-con-str.h |  2 --
 tcg/mips/tcg-target.c.inc | 30 --
 3 files changed, 13 insertions(+), 32 deletions(-)

diff --git a/tcg/mips/tcg-target-con-set.h b/tcg/mips/tcg-target-con-set.h
index fe3e868a2f..864034f468 100644
--- a/tcg/mips/tcg-target-con-set.h
+++ b/tcg/mips/tcg-target-con-set.h
@@ -12,15 +12,13 @@
 C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
-C_O0_I2(SZ, S)
-C_O0_I3(SZ, S, S)
-C_O0_I3(SZ, SZ, S)
+C_O0_I3(rZ, r, r)
+C_O0_I3(rZ, rZ, r)
 C_O0_I4(rZ, rZ, rZ, rZ)
-C_O0_I4(SZ, SZ, S, S)
-C_O1_I1(r, L)
+C_O0_I4(rZ, rZ, r, r)
 C_O1_I1(r, r)
 C_O1_I2(r, 0, rZ)
-C_O1_I2(r, L, L)
+C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
 C_O1_I2(r, r, rIK)
@@ -30,7 +28,6 @@ C_O1_I2(r, rZ, rN)
 C_O1_I2(r, rZ, rZ)
 C_O1_I4(r, rZ, rZ, rZ, 0)
 C_O1_I4(r, rZ, rZ, rZ, rZ)
-C_O2_I1(r, r, L)
-C_O2_I2(r, r, L, L)
+C_O2_I1(r, r, r)
 C_O2_I2(r, r, r, r)
 C_O2_I4(r, r, rZ, rZ, rN, rN)
diff --git a/tcg/mips/tcg-target-con-str.h b/tcg/mips/tcg-target-con-str.h
index e4b2965c72..413c280a7a 100644
--- a/tcg/mips/tcg-target-con-str.h
+++ b/tcg/mips/tcg-target-con-str.h
@@ -9,8 +9,6 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
-REGS('L', ALL_QLOAD_REGS)
-REGS('S', ALL_QSTORE_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 695c137023..5ad9867882 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -176,20 +176,6 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_WSZ  0x2000   /* word size */
 
 #define ALL_GENERAL_REGS  0xu
-#define NOA0_REGS (ALL_GENERAL_REGS & ~(1 << TCG_REG_A0))
-
-#ifdef CONFIG_SOFTMMU
-#define ALL_QLOAD_REGS \
-(NOA0_REGS & ~((TCG_TARGET_REG_BITS < TARGET_LONG_BITS) << TCG_REG_A2))
-#define ALL_QSTORE_REGS \
-(NOA0_REGS & ~(TCG_TARGET_REG_BITS < TARGET_LONG_BITS   \
-   ? (1 << TCG_REG_A2) | (1 << TCG_REG_A3)  \
-   : (1 << TCG_REG_A1)))
-#else
-#define ALL_QLOAD_REGS   NOA0_REGS
-#define ALL_QSTORE_REGS  NOA0_REGS
-#endif
-
 
 static bool is_p2m1(tcg_target_long val)
 {
@@ -2232,18 +2218,18 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_qemu_ld_i32:
 return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-? C_O1_I1(r, L) : C_O1_I2(r, L, L));
+? C_O1_I1(r, r) : C_O1_I2(r, r, r));
 case INDEX_op_qemu_st_i32:
 return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-? C_O0_I2(SZ, S) : C_O0_I3(SZ, S, S));
+? C_O0_I2(rZ, r) : C_O0_I3(rZ, r, r));
 case INDEX_op_qemu_ld_i64:
-return (TCG_TARGET_REG_BITS == 64 ? C_O1_I1(r, L)
-: TARGET_LONG_BITS == 32 ? C_O2_I1(r, r, L)
-: C_O2_I2(r, r, L, L));
+return (TCG_TARGET_REG_BITS == 64 ? C_O1_I1(r, r)
+: TARGET_LONG_BITS == 32 ? C_O2_I1(r, r, r)
+: C_O2_I2(r, r, r, r));
 case INDEX_op_qemu_st_i64:
-return (TCG_TARGET_REG_BITS == 64 ? C_O0_I2(SZ, S)
-: TARGET_LONG_BITS == 32 ? C_O0_I3(SZ, SZ, S)
-: C_O0_I4(SZ, SZ, S, S));
+return (TCG_TARGET_REG_BITS == 64 ? C_O0_I2(rZ, r)
+: TARGET_LONG_BITS == 32 ? C_O0_I3(rZ, rZ, r)
+: C_O0_I4(rZ, rZ, r, r));
 
 default:
 g_assert_not_reached();
-- 
2.34.1




[PATCH v5 19/30] tcg/s390x: Convert tcg_out_qemu_{ld,st}_slow_path

2023-05-06 Thread Richard Henderson
Use tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target.c.inc | 35 ++-
 1 file changed, 10 insertions(+), 25 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index c3157d22be..dfcf4d9e34 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1718,26 +1718,22 @@ static void tcg_out_qemu_st_direct(TCGContext *s, MemOp 
opc, TCGReg data,
 }
 
 #if defined(CONFIG_SOFTMMU)
+static const TCGLdstHelperParam ldst_helper_param = {
+.ntmp = 1, .tmp = { TCG_TMP0 }
+};
+
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-TCGReg addr_reg = lb->addrlo_reg;
-TCGReg data_reg = lb->datalo_reg;
-MemOpIdx oi = lb->oi;
-MemOp opc = get_memop(oi);
+MemOp opc = get_memop(lb->oi);
 
 if (!patch_reloc(lb->label_ptr[0], R_390_PC16DBL,
  (intptr_t)tcg_splitwx_to_rx(s->code_ptr), 2)) {
 return false;
 }
 
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
-if (TARGET_LONG_BITS == 64) {
-tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_R3, addr_reg);
-}
-tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R4, oi);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R5, (uintptr_t)lb->raddr);
-tcg_out_call_int(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)]);
-tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_R2);
+tcg_out_ld_helper_args(s, lb, &ldst_helper_param);
+tcg_out_call_int(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+tcg_out_ld_helper_ret(s, lb, false, &ldst_helper_param);
 
 tgen_gotoi(s, S390_CC_ALWAYS, lb->raddr);
 return true;
@@ -1745,25 +1741,14 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 
 static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-TCGReg addr_reg = lb->addrlo_reg;
-TCGReg data_reg = lb->datalo_reg;
-MemOpIdx oi = lb->oi;
-MemOp opc = get_memop(oi);
-MemOp size = opc & MO_SIZE;
+MemOp opc = get_memop(lb->oi);
 
 if (!patch_reloc(lb->label_ptr[0], R_390_PC16DBL,
  (intptr_t)tcg_splitwx_to_rx(s->code_ptr), 2)) {
 return false;
 }
 
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
-if (TARGET_LONG_BITS == 64) {
-tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_R3, addr_reg);
-}
-tcg_out_movext(s, size == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32,
-   TCG_REG_R4, lb->type, size, data_reg);
-tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R5, oi);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R6, (uintptr_t)lb->raddr);
+tcg_out_st_helper_args(s, lb, &ldst_helper_param);
 tcg_out_call_int(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
 
 tgen_gotoi(s, S390_CC_ALWAYS, lb->raddr);
-- 
2.34.1




[PATCH v5 14/30] tcg/arm: Convert tcg_out_qemu_{ld,st}_slow_path

2023-05-06 Thread Richard Henderson
Use tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.  This allows our local
tcg_out_arg_* infrastructure to be removed.

Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.c.inc | 140 +--
 1 file changed, 18 insertions(+), 122 deletions(-)

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index c744512778..df514e56fc 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -690,8 +690,8 @@ tcg_out_ldrd_rwb(TCGContext *s, ARMCond cond, TCGReg rt, 
TCGReg rn, TCGReg rm)
 tcg_out_memop_r(s, cond, INSN_LDRD_REG, rt, rn, rm, 1, 1, 1);
 }
 
-static void tcg_out_strd_8(TCGContext *s, ARMCond cond, TCGReg rt,
-   TCGReg rn, int imm8)
+static void __attribute__((unused))
+tcg_out_strd_8(TCGContext *s, ARMCond cond, TCGReg rt, TCGReg rn, int imm8)
 {
 tcg_out_memop_8(s, cond, INSN_STRD_IMM, rt, rn, imm8, 1, 0);
 }
@@ -969,28 +969,16 @@ static void tcg_out_ext8u(TCGContext *s, TCGReg rd, 
TCGReg rn)
 tcg_out_dat_imm(s, COND_AL, ARITH_AND, rd, rn, 0xff);
 }
 
-static void __attribute__((unused))
-tcg_out_ext8u_cond(TCGContext *s, ARMCond cond, TCGReg rd, TCGReg rn)
-{
-tcg_out_dat_imm(s, cond, ARITH_AND, rd, rn, 0xff);
-}
-
 static void tcg_out_ext16s(TCGContext *s, TCGType t, TCGReg rd, TCGReg rn)
 {
 /* sxth */
 tcg_out32(s, 0x06bf0070 | (COND_AL << 28) | (rd << 12) | rn);
 }
 
-static void tcg_out_ext16u_cond(TCGContext *s, ARMCond cond,
-TCGReg rd, TCGReg rn)
-{
-/* uxth */
-tcg_out32(s, 0x06ff0070 | (cond << 28) | (rd << 12) | rn);
-}
-
 static void tcg_out_ext16u(TCGContext *s, TCGReg rd, TCGReg rn)
 {
-tcg_out_ext16u_cond(s, COND_AL, rd, rn);
+/* uxth */
+tcg_out32(s, 0x06ff0070 | (COND_AL << 28) | (rd << 12) | rn);
 }
 
 static void tcg_out_ext32s(TCGContext *s, TCGReg rd, TCGReg rn)
@@ -1382,92 +1370,29 @@ static void * const qemu_st_helpers[MO_SIZE + 1] = {
 #endif
 };
 
-/* Helper routines for marshalling helper function arguments into
- * the correct registers and stack.
- * argreg is where we want to put this argument, arg is the argument itself.
- * Return value is the updated argreg ready for the next call.
- * Note that argreg 0..3 is real registers, 4+ on stack.
- *
- * We provide routines for arguments which are: immediate, 32 bit
- * value in register, 16 and 8 bit values in register (which must be zero
- * extended before use) and 64 bit value in a lo:hi register pair.
- */
-#define DEFINE_TCG_OUT_ARG(NAME, ARGTYPE, MOV_ARG, EXT_ARG)\
-static TCGReg NAME(TCGContext *s, TCGReg argreg, ARGTYPE arg)  \
-{  \
-if (argreg < 4) {  \
-MOV_ARG(s, COND_AL, argreg, arg);  \
-} else {   \
-int ofs = (argreg - 4) * 4;\
-EXT_ARG;   \
-tcg_debug_assert(ofs + 4 <= TCG_STATIC_CALL_ARGS_SIZE);\
-tcg_out_st32_12(s, COND_AL, arg, TCG_REG_CALL_STACK, ofs); \
-}  \
-return argreg + 1; \
-}
-
-DEFINE_TCG_OUT_ARG(tcg_out_arg_imm32, uint32_t, tcg_out_movi32,
-(tcg_out_movi32(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
-DEFINE_TCG_OUT_ARG(tcg_out_arg_reg8, TCGReg, tcg_out_ext8u_cond,
-(tcg_out_ext8u_cond(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
-DEFINE_TCG_OUT_ARG(tcg_out_arg_reg16, TCGReg, tcg_out_ext16u_cond,
-(tcg_out_ext16u_cond(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
-DEFINE_TCG_OUT_ARG(tcg_out_arg_reg32, TCGReg, tcg_out_mov_reg, )
-
-static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
-TCGReg arglo, TCGReg arghi)
+static TCGReg ldst_ra_gen(TCGContext *s, const TCGLabelQemuLdst *l, int arg)
 {
-/* 64 bit arguments must go in even/odd register pairs
- * and in 8-aligned stack slots.
- */
-if (argreg & 1) {
-argreg++;
-}
-if (argreg >= 4 && (arglo & 1) == 0 && arghi == arglo + 1) {
-tcg_out_strd_8(s, COND_AL, arglo,
-   TCG_REG_CALL_STACK, (argreg - 4) * 4);
-return argreg + 2;
-} else {
-argreg = tcg_out_arg_reg32(s, argreg, arglo);
-argreg = tcg_out_arg_reg32(s, argreg, arghi);
-return argreg;
-}
+/* We arrive at the slow path via "BLNE", so R14 contains l->raddr. */
+return TCG_REG_R14;
 }
 
+static const TCGLdstHelperParam ldst_helper_param = {
+.ra_gen = ldst_ra_gen,
+.ntmp = 1,
+.tmp = { TCG_REG_TMP },
+};
+
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-TCG

[PATCH v5 01/30] tcg/i386: Introduce prepare_host_addr

2023-05-06 Thread Richard Henderson
Merge tcg_out_tlb_load, add_qemu_ldst_label,
tcg_out_test_alignment, and some code that lived in both
tcg_out_qemu_ld and tcg_out_qemu_st into one function
that returns HostAddress and TCGLabelQemuLdst structures.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 344 --
 1 file changed, 143 insertions(+), 201 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index aae698121a..237b154194 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1802,135 +1802,6 @@ static void * const qemu_st_helpers[(MO_SIZE | 
MO_BSWAP) + 1] = {
 [MO_BEUQ] = helper_be_stq_mmu,
 };
 
-/* Perform the TLB load and compare.
-
-   Inputs:
-   ADDRLO and ADDRHI contain the low and high part of the address.
-
-   MEM_INDEX and S_BITS are the memory context and log2 size of the load.
-
-   WHICH is the offset into the CPUTLBEntry structure of the slot to read.
-   This should be offsetof addr_read or addr_write.
-
-   Outputs:
-   LABEL_PTRS is filled with 1 (32-bit addresses) or 2 (64-bit addresses)
-   positions of the displacements of forward jumps to the TLB miss case.
-
-   Second argument register is loaded with the low part of the address.
-   In the TLB hit case, it has been adjusted as indicated by the TLB
-   and so is a host address.  In the TLB miss case, it continues to
-   hold a guest address.
-
-   First argument register is clobbered.  */
-
-static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg 
addrhi,
-int mem_index, MemOp opc,
-tcg_insn_unit **label_ptr, int which)
-{
-TCGType ttype = TCG_TYPE_I32;
-TCGType tlbtype = TCG_TYPE_I32;
-int trexw = 0, hrexw = 0, tlbrexw = 0;
-unsigned a_bits = get_alignment_bits(opc);
-unsigned s_bits = opc & MO_SIZE;
-unsigned a_mask = (1 << a_bits) - 1;
-unsigned s_mask = (1 << s_bits) - 1;
-target_ulong tlb_mask;
-
-if (TCG_TARGET_REG_BITS == 64) {
-if (TARGET_LONG_BITS == 64) {
-ttype = TCG_TYPE_I64;
-trexw = P_REXW;
-}
-if (TCG_TYPE_PTR == TCG_TYPE_I64) {
-hrexw = P_REXW;
-if (TARGET_PAGE_BITS + CPU_TLB_DYN_MAX_BITS > 32) {
-tlbtype = TCG_TYPE_I64;
-tlbrexw = P_REXW;
-}
-}
-}
-
-tcg_out_mov(s, tlbtype, TCG_REG_L0, addrlo);
-tcg_out_shifti(s, SHIFT_SHR + tlbrexw, TCG_REG_L0,
-   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-
-tcg_out_modrm_offset(s, OPC_AND_GvEv + trexw, TCG_REG_L0, TCG_AREG0,
- TLB_MASK_TABLE_OFS(mem_index) +
- offsetof(CPUTLBDescFast, mask));
-
-tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, TCG_REG_L0, TCG_AREG0,
- TLB_MASK_TABLE_OFS(mem_index) +
- offsetof(CPUTLBDescFast, table));
-
-/* If the required alignment is at least as large as the access, simply
-   copy the address and mask.  For lesser alignments, check that we don't
-   cross pages for the complete access.  */
-if (a_bits >= s_bits) {
-tcg_out_mov(s, ttype, TCG_REG_L1, addrlo);
-} else {
-tcg_out_modrm_offset(s, OPC_LEA + trexw, TCG_REG_L1,
- addrlo, s_mask - a_mask);
-}
-tlb_mask = (target_ulong)TARGET_PAGE_MASK | a_mask;
-tgen_arithi(s, ARITH_AND + trexw, TCG_REG_L1, tlb_mask, 0);
-
-/* cmp 0(TCG_REG_L0), TCG_REG_L1 */
-tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw,
- TCG_REG_L1, TCG_REG_L0, which);
-
-/* Prepare for both the fast path add of the tlb addend, and the slow
-   path function argument setup.  */
-tcg_out_mov(s, ttype, TCG_REG_L1, addrlo);
-
-/* jne slow_path */
-tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
-label_ptr[0] = s->code_ptr;
-s->code_ptr += 4;
-
-if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-/* cmp 4(TCG_REG_L0), addrhi */
-tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, TCG_REG_L0, which + 4);
-
-/* jne slow_path */
-tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
-label_ptr[1] = s->code_ptr;
-s->code_ptr += 4;
-}
-
-/* TLB Hit.  */
-
-/* add addend(TCG_REG_L0), TCG_REG_L1 */
-tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, TCG_REG_L1, TCG_REG_L0,
- offsetof(CPUTLBEntry, addend));
-}
-
-/*
- * Record the context of a call to the out of line helper code for the slow 
path
- * for a load or store, so that we can later generate the correct helper code
- */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld,
-TCGType type, MemOpIdx oi,
-TCGReg datalo, TCGReg datahi,
-TCGReg addrlo, TCGReg addrhi,
-tcg_insn_unit *raddr,
-   

[PATCH v5 07/30] tcg/ppc: Introduce prepare_host_addr

2023-05-06 Thread Richard Henderson
Merge tcg_out_tlb_load, add_qemu_ldst_label, tcg_out_test_alignment,
and some code that lived in both tcg_out_qemu_ld and tcg_out_qemu_st
into one function that returns HostAddress and TCGLabelQemuLdst structures.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 377 +--
 1 file changed, 168 insertions(+), 209 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index cd473deb36..7239335bdf 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2003,140 +2003,6 @@ static void * const qemu_st_helpers[(MO_SIZE | 
MO_BSWAP) + 1] = {
 [MO_BEUQ] = helper_be_stq_mmu,
 };
 
-/* We expect to use a 16-bit negative offset from ENV.  */
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -32768);
-
-/* Perform the TLB load and compare.  Places the result of the comparison
-   in CR7, loads the addend of the TLB into R3, and returns the register
-   containing the guest address (zero-extended into R4).  Clobbers R0 and R2. 
*/
-
-static TCGReg tcg_out_tlb_read(TCGContext *s, MemOp opc,
-   TCGReg addrlo, TCGReg addrhi,
-   int mem_index, bool is_read)
-{
-int cmp_off
-= (is_read
-   ? offsetof(CPUTLBEntry, addr_read)
-   : offsetof(CPUTLBEntry, addr_write));
-int fast_off = TLB_MASK_TABLE_OFS(mem_index);
-int mask_off = fast_off + offsetof(CPUTLBDescFast, mask);
-int table_off = fast_off + offsetof(CPUTLBDescFast, table);
-unsigned s_bits = opc & MO_SIZE;
-unsigned a_bits = get_alignment_bits(opc);
-
-/* Load tlb_mask[mmu_idx] and tlb_table[mmu_idx].  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_AREG0, mask_off);
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R4, TCG_AREG0, table_off);
-
-/* Extract the page index, shifted into place for tlb index.  */
-if (TCG_TARGET_REG_BITS == 32) {
-tcg_out_shri32(s, TCG_REG_TMP1, addrlo,
-   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-} else {
-tcg_out_shri64(s, TCG_REG_TMP1, addrlo,
-   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-}
-tcg_out32(s, AND | SAB(TCG_REG_R3, TCG_REG_R3, TCG_REG_TMP1));
-
-/* Load the TLB comparator.  */
-if (cmp_off == 0 && TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
-uint32_t lxu = (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32
-? LWZUX : LDUX);
-tcg_out32(s, lxu | TAB(TCG_REG_TMP1, TCG_REG_R3, TCG_REG_R4));
-} else {
-tcg_out32(s, ADD | TAB(TCG_REG_R3, TCG_REG_R3, TCG_REG_R4));
-if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_TMP1, TCG_REG_R3, cmp_off + 4);
-tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_R4, TCG_REG_R3, cmp_off);
-} else {
-tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP1, TCG_REG_R3, cmp_off);
-}
-}
-
-/* Load the TLB addend for use on the fast path.  Do this asap
-   to minimize any load use delay.  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_REG_R3,
-   offsetof(CPUTLBEntry, addend));
-
-/* Clear the non-page, non-alignment bits from the address */
-if (TCG_TARGET_REG_BITS == 32) {
-/* We don't support unaligned accesses on 32-bits.
- * Preserve the bottom bits and thus trigger a comparison
- * failure on unaligned accesses.
- */
-if (a_bits < s_bits) {
-a_bits = s_bits;
-}
-tcg_out_rlw(s, RLWINM, TCG_REG_R0, addrlo, 0,
-(32 - a_bits) & 31, 31 - TARGET_PAGE_BITS);
-} else {
-TCGReg t = addrlo;
-
-/* If the access is unaligned, we need to make sure we fail if we
- * cross a page boundary.  The trick is to add the access size-1
- * to the address before masking the low bits.  That will make the
- * address overflow to the next page if we cross a page boundary,
- * which will then force a mismatch of the TLB compare.
- */
-if (a_bits < s_bits) {
-unsigned a_mask = (1 << a_bits) - 1;
-unsigned s_mask = (1 << s_bits) - 1;
-tcg_out32(s, ADDI | TAI(TCG_REG_R0, t, s_mask - a_mask));
-t = TCG_REG_R0;
-}
-
-/* Mask the address for the requested alignment.  */
-if (TARGET_LONG_BITS == 32) {
-tcg_out_rlw(s, RLWINM, TCG_REG_R0, t, 0,
-(32 - a_bits) & 31, 31 - TARGET_PAGE_BITS);
-/* Zero-extend the address for use in the final address.  */
-tcg_out_ext32u(s, TCG_REG_R4, addrlo);
-addrlo = TCG_REG_R4;
-} else if (a_bits == 0) {
-tcg_out_rld(s, RLDICR, TCG_REG_R0, t, 0, 63 - TARGET_PAGE_BITS);
-} else {
-tcg_out_rld(s, RLDICL, TCG_REG_R0, t,
-64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - a_bits);
-

[PATCH v5 02/30] tcg/i386: Use indexed addressing for softmmu fast path

2023-05-06 Thread Richard Henderson
Since tcg_out_{ld,st}_helper_args, the slow path no longer requires
the address argument to be set up by the tlb load sequence.  Use a
plain load for the addend and indexed addressing with the original
input address register.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 237b154194..8752968af2 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1837,7 +1837,8 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 tcg_out_sti(s, TCG_TYPE_PTR, (uintptr_t)l->raddr, TCG_REG_ESP, ofs);
 } else {
 tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-/* The second argument is already loaded with addrlo.  */
+tcg_out_mov(s, TCG_TYPE_TL, tcg_target_call_iarg_regs[1],
+l->addrlo_reg);
 tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], oi);
 tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
  (uintptr_t)l->raddr);
@@ -1910,7 +1911,8 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP, ofs);
 } else {
 tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-/* The second argument is already loaded with addrlo.  */
+tcg_out_mov(s, TCG_TYPE_TL, tcg_target_call_iarg_regs[1],
+l->addrlo_reg);
 tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
 tcg_target_call_iarg_regs[2], l->datalo_reg);
 tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3], oi);
@@ -2083,16 +2085,6 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw,
  TCG_REG_L1, TCG_REG_L0, cmp_ofs);
 
-/*
- * Prepare for both the fast path add of the tlb addend, and the slow
- * path function argument setup.
- */
-*h = (HostAddress) {
-.base = TCG_REG_L1,
-.index = -1
-};
-tcg_out_mov(s, ttype, h->base, addrlo);
-
 /* jne slow_path */
 tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
 ldst->label_ptr[0] = s->code_ptr;
@@ -2109,10 +2101,13 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 }
 
 /* TLB Hit.  */
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_L0, TCG_REG_L0,
+   offsetof(CPUTLBEntry, addend));
 
-/* add addend(TCG_REG_L0), TCG_REG_L1 */
-tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, h->base, TCG_REG_L0,
- offsetof(CPUTLBEntry, addend));
+*h = (HostAddress) {
+.base = addrlo,
+.index = TCG_REG_L0,
+};
 #else
 if (a_bits) {
 ldst = new_ldst_label(s);
-- 
2.34.1




[PATCH v5 22/30] tcg/mips: Reorg tlb load within prepare_host_addr

2023-05-06 Thread Richard Henderson
Compare the address vs the tlb entry with sign-extended values.
This simplifies the page+alignment mask constant, and the
generation of the last byte address for the misaligned test.

Move the tlb addend load up, and the zero-extension down.

This frees up a register, which allows us use TMP3 as the returned base
address register instead of A0, which we were using as a 5th temporary.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 38 ++
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 31d58e1977..695c137023 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -370,6 +370,8 @@ typedef enum {
 ALIAS_PADDI= sizeof(void *) == 4 ? OPC_ADDIU : OPC_DADDIU,
 ALIAS_TSRL = TARGET_LONG_BITS == 32 || TCG_TARGET_REG_BITS == 32
  ? OPC_SRL : OPC_DSRL,
+ALIAS_TADDI= TARGET_LONG_BITS == 32 || TCG_TARGET_REG_BITS == 32
+ ? OPC_ADDIU : OPC_DADDIU,
 } MIPSInsn;
 
 /*
@@ -1263,14 +1265,12 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 int add_off = offsetof(CPUTLBEntry, addend);
 int cmp_off = is_ld ? offsetof(CPUTLBEntry, addr_read)
 : offsetof(CPUTLBEntry, addr_write);
-target_ulong tlb_mask;
 
 ldst = new_ldst_label(s);
 ldst->is_ld = is_ld;
 ldst->oi = oi;
 ldst->addrlo_reg = addrlo;
 ldst->addrhi_reg = addrhi;
-base = TCG_REG_A0;
 
 /* Load tlb_mask[mmu_idx] and tlb_table[mmu_idx].  */
 QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
@@ -1290,15 +1290,12 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
 tcg_out_ldst(s, OPC_LW, TCG_TMP0, TCG_TMP3, cmp_off + LO_OFF);
 } else {
-tcg_out_ldst(s, (TARGET_LONG_BITS == 64 ? OPC_LD
- : TCG_TARGET_REG_BITS == 64 ? OPC_LWU : OPC_LW),
- TCG_TMP0, TCG_TMP3, cmp_off);
+tcg_out_ld(s, TCG_TYPE_TL, TCG_TMP0, TCG_TMP3, cmp_off);
 }
 
-/* Zero extend a 32-bit guest address for a 64-bit host. */
-if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
-tcg_out_ext32u(s, base, addrlo);
-addrlo = base;
+if (TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
+/* Load the tlb addend for the fast path.  */
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP3, TCG_TMP3, add_off);
 }
 
 /*
@@ -1306,18 +1303,18 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
  * For unaligned accesses, compare against the end of the access to
  * verify that it does not cross a page boundary.
  */
-tlb_mask = (target_ulong)TARGET_PAGE_MASK | a_mask;
-tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1, tlb_mask);
-if (a_mask >= s_mask) {
-tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, addrlo);
-} else {
-tcg_out_opc_imm(s, ALIAS_PADDI, TCG_TMP2, addrlo, s_mask - a_mask);
+tcg_out_movi(s, TCG_TYPE_TL, TCG_TMP1, TARGET_PAGE_MASK | a_mask);
+if (a_mask < s_mask) {
+tcg_out_opc_imm(s, ALIAS_TADDI, TCG_TMP2, addrlo, s_mask - a_mask);
 tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, TCG_TMP2);
+} else {
+tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, addrlo);
 }
 
-if (TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
-/* Load the tlb addend for the fast path.  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP2, TCG_TMP3, add_off);
+/* Zero extend a 32-bit guest address for a 64-bit host. */
+if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
+tcg_out_ext32u(s, TCG_TMP2, addrlo);
+addrlo = TCG_TMP2;
 }
 
 ldst->label_ptr[0] = s->code_ptr;
@@ -1329,14 +1326,15 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 tcg_out_ldst(s, OPC_LW, TCG_TMP0, TCG_TMP3, cmp_off + HI_OFF);
 
 /* Load the tlb addend for the fast path.  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP2, TCG_TMP3, add_off);
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP3, TCG_TMP3, add_off);
 
 ldst->label_ptr[1] = s->code_ptr;
 tcg_out_opc_br(s, OPC_BNE, addrhi, TCG_TMP0);
 }
 
 /* delay slot */
-tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_TMP2, addrlo);
+base = TCG_TMP3;
+tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_TMP3, addrlo);
 #else
 if (a_mask && (use_mips32r6_instructions || a_bits != s_bits)) {
 ldst = new_ldst_label(s);
-- 
2.34.1




[PATCH v5 03/30] tcg/aarch64: Introduce prepare_host_addr

2023-05-06 Thread Richard Henderson
Merge tcg_out_tlb_load, add_qemu_ldst_label, tcg_out_test_alignment,
and some code that lived in both tcg_out_qemu_ld and tcg_out_qemu_st
into one function that returns HostAddress and TCGLabelQemuLdst structures.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.c.inc | 313 +++
 1 file changed, 133 insertions(+), 180 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index d8d464e4a0..202b90c001 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1667,113 +1667,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 tcg_out_goto(s, lb->raddr);
 return true;
 }
-
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, MemOpIdx oi,
-TCGType ext, TCGReg data_reg, TCGReg addr_reg,
-tcg_insn_unit *raddr, tcg_insn_unit *label_ptr)
-{
-TCGLabelQemuLdst *label = new_ldst_label(s);
-
-label->is_ld = is_ld;
-label->oi = oi;
-label->type = ext;
-label->datalo_reg = data_reg;
-label->addrlo_reg = addr_reg;
-label->raddr = tcg_splitwx_to_rx(raddr);
-label->label_ptr[0] = label_ptr;
-}
-
-/* We expect to use a 7-bit scaled negative offset from ENV.  */
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -512);
-
-/* These offsets are built into the LDP below.  */
-QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, mask) != 0);
-QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, table) != 8);
-
-/* Load and compare a TLB entry, emitting the conditional jump to the
-   slow path for the failure case, which will be patched later when finalizing
-   the slow path. Generated code returns the host addend in X1,
-   clobbers X0,X2,X3,TMP. */
-static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, MemOp opc,
- tcg_insn_unit **label_ptr, int mem_index,
- bool is_read)
-{
-unsigned a_bits = get_alignment_bits(opc);
-unsigned s_bits = opc & MO_SIZE;
-unsigned a_mask = (1u << a_bits) - 1;
-unsigned s_mask = (1u << s_bits) - 1;
-TCGReg x3;
-TCGType mask_type;
-uint64_t compare_mask;
-
-mask_type = (TARGET_PAGE_BITS + CPU_TLB_DYN_MAX_BITS > 32
- ? TCG_TYPE_I64 : TCG_TYPE_I32);
-
-/* Load env_tlb(env)->f[mmu_idx].{mask,table} into {x0,x1}.  */
-tcg_out_insn(s, 3314, LDP, TCG_REG_X0, TCG_REG_X1, TCG_AREG0,
- TLB_MASK_TABLE_OFS(mem_index), 1, 0);
-
-/* Extract the TLB index from the address into X0.  */
-tcg_out_insn(s, 3502S, AND_LSR, mask_type == TCG_TYPE_I64,
- TCG_REG_X0, TCG_REG_X0, addr_reg,
- TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-
-/* Add the tlb_table pointer, creating the CPUTLBEntry address into X1.  */
-tcg_out_insn(s, 3502, ADD, 1, TCG_REG_X1, TCG_REG_X1, TCG_REG_X0);
-
-/* Load the tlb comparator into X0, and the fast path addend into X1.  */
-tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_X0, TCG_REG_X1, is_read
-   ? offsetof(CPUTLBEntry, addr_read)
-   : offsetof(CPUTLBEntry, addr_write));
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_X1, TCG_REG_X1,
-   offsetof(CPUTLBEntry, addend));
-
-/* For aligned accesses, we check the first byte and include the alignment
-   bits within the address.  For unaligned access, we check that we don't
-   cross pages using the address of the last byte of the access.  */
-if (a_bits >= s_bits) {
-x3 = addr_reg;
-} else {
-tcg_out_insn(s, 3401, ADDI, TARGET_LONG_BITS == 64,
- TCG_REG_X3, addr_reg, s_mask - a_mask);
-x3 = TCG_REG_X3;
-}
-compare_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
-
-/* Store the page mask part of the address into X3.  */
-tcg_out_logicali(s, I3404_ANDI, TARGET_LONG_BITS == 64,
- TCG_REG_X3, x3, compare_mask);
-
-/* Perform the address comparison. */
-tcg_out_cmp(s, TARGET_LONG_BITS == 64, TCG_REG_X0, TCG_REG_X3, 0);
-
-/* If not equal, we jump to the slow path. */
-*label_ptr = s->code_ptr;
-tcg_out_insn(s, 3202, B_C, TCG_COND_NE, 0);
-}
-
 #else
-static void tcg_out_test_alignment(TCGContext *s, bool is_ld, TCGReg addr_reg,
-   unsigned a_bits)
-{
-unsigned a_mask = (1 << a_bits) - 1;
-TCGLabelQemuLdst *label = new_ldst_label(s);
-
-label->is_ld = is_ld;
-label->addrlo_reg = addr_reg;
-
-/* tst addr, #mask */
-tcg_out_logicali(s, I3404_ANDSI, 0, TCG_REG_XZR, addr_reg, a_mask);
-
-label->label_ptr[0] = s->code_ptr;
-
-/* b.ne slow_path */
-tcg_out_insn(s, 3202, B_C, TCG_COND_NE, 0);
-
-label->raddr = tcg_splitwx_to_rx(s->code_ptr);
-}
-
 static bool tcg_out_fail_alignment(TCGContext *s, TCGLabelQemuLdst *l)
 {
 if (!reloc_pc19(l->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
@@ -1801,6 +1695,125 @@ 

[PATCH v5 00/30] tcg: Simplify calls to load/store helpers

2023-05-06 Thread Richard Henderson
There are several changes to the load/store helpers coming, and making
sure that those changes are properly reflected across all of the backends
was harrowing.

I have gone back and restarted by hoisting the code out of the backends
and into tcg.c.  We already have all of the parameters for the host
function call abi for "normal" helpers, we simply need to apply that to
the load/store slow path.

Changes for v5:
  * 24 patches upstreamed; 6 of the remaining 30 have reviews, but
could not be merged out of order.


r~


Richard Henderson (30):
  tcg/i386: Introduce prepare_host_addr
  tcg/i386: Use indexed addressing for softmmu fast path
  tcg/aarch64: Introduce prepare_host_addr
  tcg/arm: Introduce prepare_host_addr
  tcg/loongarch64: Introduce prepare_host_addr
  tcg/mips: Introduce prepare_host_addr
  tcg/ppc: Introduce prepare_host_addr
  tcg/riscv: Introduce prepare_host_addr
  tcg/s390x: Introduce prepare_host_addr
  tcg: Add routines for calling slow-path helpers
  tcg/i386: Convert tcg_out_qemu_ld_slow_path
  tcg/i386: Convert tcg_out_qemu_st_slow_path
  tcg/aarch64: Convert tcg_out_qemu_{ld,st}_slow_path
  tcg/arm: Convert tcg_out_qemu_{ld,st}_slow_path
  tcg/loongarch64: Convert tcg_out_qemu_{ld,st}_slow_path
  tcg/mips: Convert tcg_out_qemu_{ld,st}_slow_path
  tcg/ppc: Convert tcg_out_qemu_{ld,st}_slow_path
  tcg/riscv: Convert tcg_out_qemu_{ld,st}_slow_path
  tcg/s390x: Convert tcg_out_qemu_{ld,st}_slow_path
  tcg/loongarch64: Simplify constraints on qemu_ld/st
  tcg/mips: Remove MO_BSWAP handling
  tcg/mips: Reorg tlb load within prepare_host_addr
  tcg/mips: Simplify constraints on qemu_ld/st
  tcg/ppc: Reorg tcg_out_tlb_read
  tcg/ppc: Adjust constraints on qemu_ld/st
  tcg/ppc: Remove unused constraints A, B, C, D
  tcg/ppc: Remove unused constraint J
  tcg/riscv: Simplify constraints on qemu_ld/st
  tcg/s390x: Use ALGFR in constructing softmmu host address
  tcg/s390x: Simplify constraints on qemu_ld/st

 tcg/loongarch64/tcg-target-con-set.h |   2 -
 tcg/loongarch64/tcg-target-con-str.h |   1 -
 tcg/mips/tcg-target-con-set.h|  13 +-
 tcg/mips/tcg-target-con-str.h|   2 -
 tcg/mips/tcg-target.h|   4 +-
 tcg/ppc/tcg-target-con-set.h |  11 +-
 tcg/ppc/tcg-target-con-str.h |   7 -
 tcg/riscv/tcg-target-con-set.h   |   2 -
 tcg/riscv/tcg-target-con-str.h   |   1 -
 tcg/s390x/tcg-target-con-set.h   |   2 -
 tcg/s390x/tcg-target-con-str.h   |   1 -
 tcg/tcg.c| 456 +-
 tcg/aarch64/tcg-target.c.inc | 347 +--
 tcg/arm/tcg-target.c.inc | 455 +-
 tcg/i386/tcg-target.c.inc| 451 +-
 tcg/loongarch64/tcg-target.c.inc | 313 --
 tcg/mips/tcg-target.c.inc| 870 ---
 tcg/ppc/tcg-target.c.inc | 510 +++-
 tcg/riscv/tcg-target.c.inc   | 304 --
 tcg/s390x/tcg-target.c.inc   | 314 --
 20 files changed, 1766 insertions(+), 2300 deletions(-)

-- 
2.34.1




[PATCH v5 24/30] tcg/ppc: Reorg tcg_out_tlb_read

2023-05-06 Thread Richard Henderson
Allocate TCG_REG_TMP2.  Use R0, TMP1, TMP2 instead of any of
the normally allocated registers for the tlb load.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 84 
 1 file changed, 51 insertions(+), 33 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 042136fee7..6850ecbc80 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -68,6 +68,7 @@
 #else
 # define TCG_REG_TMP1   TCG_REG_R12
 #endif
+#define TCG_REG_TMP2TCG_REG_R11
 
 #define TCG_VEC_TMP1TCG_REG_V0
 #define TCG_VEC_TMP2TCG_REG_V1
@@ -2015,13 +2016,11 @@ static TCGReg ldst_ra_gen(TCGContext *s, const 
TCGLabelQemuLdst *l, int arg)
 /*
  * For the purposes of ppc32 sorting 4 input registers into 4 argument
  * registers, there is an outside chance we would require 3 temps.
- * Because of constraints, no inputs are in r3, and env will not be
- * placed into r3 until after the sorting is done, and is thus free.
  */
 static const TCGLdstHelperParam ldst_helper_param = {
 .ra_gen = ldst_ra_gen,
 .ntmp = 3,
-.tmp = { TCG_REG_TMP1, TCG_REG_R0, TCG_REG_R3 }
+.tmp = { TCG_REG_TMP1, TCG_REG_TMP2, TCG_REG_R0 }
 };
 
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
@@ -2135,41 +2134,44 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 /* Load tlb_mask[mmu_idx] and tlb_table[mmu_idx].  */
 QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
 QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -32768);
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_AREG0, mask_off);
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R4, TCG_AREG0, table_off);
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, TCG_AREG0, mask_off);
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP2, TCG_AREG0, table_off);
 
 /* Extract the page index, shifted into place for tlb index.  */
 if (TCG_TARGET_REG_BITS == 32) {
-tcg_out_shri32(s, TCG_REG_TMP1, addrlo,
+tcg_out_shri32(s, TCG_REG_R0, addrlo,
TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
 } else {
-tcg_out_shri64(s, TCG_REG_TMP1, addrlo,
+tcg_out_shri64(s, TCG_REG_R0, addrlo,
TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
 }
-tcg_out32(s, AND | SAB(TCG_REG_R3, TCG_REG_R3, TCG_REG_TMP1));
+tcg_out32(s, AND | SAB(TCG_REG_TMP1, TCG_REG_TMP1, TCG_REG_R0));
 
-/* Load the TLB comparator.  */
+/* Load the (low part) TLB comparator into TMP2.  */
 if (cmp_off == 0 && TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
 uint32_t lxu = (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32
 ? LWZUX : LDUX);
-tcg_out32(s, lxu | TAB(TCG_REG_TMP1, TCG_REG_R3, TCG_REG_R4));
+tcg_out32(s, lxu | TAB(TCG_REG_TMP2, TCG_REG_TMP1, TCG_REG_TMP2));
 } else {
-tcg_out32(s, ADD | TAB(TCG_REG_R3, TCG_REG_R3, TCG_REG_R4));
+tcg_out32(s, ADD | TAB(TCG_REG_TMP1, TCG_REG_TMP1, TCG_REG_TMP2));
 if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_TMP1, TCG_REG_R3, cmp_off + 4);
-tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_R4, TCG_REG_R3, cmp_off);
+tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_TMP2,
+   TCG_REG_TMP1, cmp_off + 4 * HOST_BIG_ENDIAN);
 } else {
-tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP1, TCG_REG_R3, cmp_off);
+tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP2, TCG_REG_TMP1, cmp_off);
 }
 }
 
-/* Load the TLB addend for use on the fast path.  Do this asap
-   to minimize any load use delay.  */
-h->base = TCG_REG_R3;
-tcg_out_ld(s, TCG_TYPE_PTR, h->base, TCG_REG_R3,
-   offsetof(CPUTLBEntry, addend));
+/*
+ * Load the TLB addend for use on the fast path.
+ * Do this asap to minimize any load use delay.
+ */
+if (TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, TCG_REG_TMP1,
+   offsetof(CPUTLBEntry, addend));
+}
 
-/* Clear the non-page, non-alignment bits from the address */
+/* Clear the non-page, non-alignment bits from the address in R0. */
 if (TCG_TARGET_REG_BITS == 32) {
 /* We don't support unaligned accesses on 32-bits.
  * Preserve the bottom bits and thus trigger a comparison
@@ -2200,9 +2202,6 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, 
HostAddress *h,
 if (TARGET_LONG_BITS == 32) {
 tcg_out_rlw(s, RLWINM, TCG_REG_R0, t, 0,
 (32 - a_bits) & 31, 31 - TARGET_PAGE_BITS);
-/* Zero-extend the address for use in the final address.  */
-tcg_out_ext32u(s, TCG_REG_R4, addrlo);
-addrlo = TCG_REG_R4;
 } else if (a_bits == 0) {
 tcg_out_rld(s, RLDICR, TCG_REG_R0, t, 0, 63 - TARGET_PAGE_BITS);
 } else {
@@ -2211,21 +221

[PATCH v5 20/30] tcg/loongarch64: Simplify constraints on qemu_ld/st

2023-05-06 Thread Richard Henderson
The softmmu tlb uses TCG_REG_TMP[0-2], not any of the normally available
registers.  Now that we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.

Signed-off-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |  2 --
 tcg/loongarch64/tcg-target-con-str.h |  1 -
 tcg/loongarch64/tcg-target.c.inc | 23 ---
 3 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 172c107289..c2bde44613 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -17,9 +17,7 @@
 C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
-C_O0_I2(LZ, L)
 C_O1_I1(r, r)
-C_O1_I1(r, L)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index 541ff47fa9..6e9ccca3ad 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -14,7 +14,6 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
-REGS('L', ALL_GENERAL_REGS & ~SOFTMMU_RESERVE_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 60d2c904dd..83fa45c802 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -133,18 +133,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_C12   0x1000
 #define TCG_CT_CONST_WSZ   0x2000
 
-#define ALL_GENERAL_REGS  MAKE_64BIT_MASK(0, 32)
-/*
- * For softmmu, we need to avoid conflicts with the first 5
- * argument registers to call the helper.  Some of these are
- * also used for the tlb lookup.
- */
-#ifdef CONFIG_SOFTMMU
-#define SOFTMMU_RESERVE_REGS  MAKE_64BIT_MASK(TCG_REG_A0, 5)
-#else
-#define SOFTMMU_RESERVE_REGS  0
-#endif
-
+#define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 
 static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
 {
@@ -1541,16 +1530,14 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 case INDEX_op_st32_i64:
 case INDEX_op_st_i32:
 case INDEX_op_st_i64:
+case INDEX_op_qemu_st_i32:
+case INDEX_op_qemu_st_i64:
 return C_O0_I2(rZ, r);
 
 case INDEX_op_brcond_i32:
 case INDEX_op_brcond_i64:
 return C_O0_I2(rZ, rZ);
 
-case INDEX_op_qemu_st_i32:
-case INDEX_op_qemu_st_i64:
-return C_O0_I2(LZ, L);
-
 case INDEX_op_ext8s_i32:
 case INDEX_op_ext8s_i64:
 case INDEX_op_ext8u_i32:
@@ -1586,11 +1573,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_ld32u_i64:
 case INDEX_op_ld_i32:
 case INDEX_op_ld_i64:
-return C_O1_I1(r, r);
-
 case INDEX_op_qemu_ld_i32:
 case INDEX_op_qemu_ld_i64:
-return C_O1_I1(r, L);
+return C_O1_I1(r, r);
 
 case INDEX_op_andc_i32:
 case INDEX_op_andc_i64:
-- 
2.34.1




[PATCH v5 26/30] tcg/ppc: Remove unused constraints A, B, C, D

2023-05-06 Thread Richard Henderson
These constraints have not been used for quite some time.

Fixes: 77b73de67632 ("Use rem/div[u]_i32 drop div[u]2_i32")
Reviewed-by: Daniel Henrique Barboza 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target-con-str.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/tcg/ppc/tcg-target-con-str.h b/tcg/ppc/tcg-target-con-str.h
index f3bf030bc3..9dcbc3df50 100644
--- a/tcg/ppc/tcg-target-con-str.h
+++ b/tcg/ppc/tcg-target-con-str.h
@@ -10,10 +10,6 @@
  */
 REGS('r', ALL_GENERAL_REGS)
 REGS('v', ALL_VECTOR_REGS)
-REGS('A', 1u << TCG_REG_R3)
-REGS('B', 1u << TCG_REG_R4)
-REGS('C', 1u << TCG_REG_R5)
-REGS('D', 1u << TCG_REG_R6)
 
 /*
  * Define constraint letters for constants:
-- 
2.34.1




[PATCH v5 09/30] tcg/s390x: Introduce prepare_host_addr

2023-05-06 Thread Richard Henderson
Merge tcg_out_tlb_load, add_qemu_ldst_label, tcg_out_test_alignment,
tcg_prepare_user_ldst, and some code that lived in both tcg_out_qemu_ld
and tcg_out_qemu_st into one function that returns HostAddress and
TCGLabelQemuLdst structures.

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target.c.inc | 263 -
 1 file changed, 113 insertions(+), 150 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index da7ee5b085..c3157d22be 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1718,78 +1718,6 @@ static void tcg_out_qemu_st_direct(TCGContext *s, MemOp 
opc, TCGReg data,
 }
 
 #if defined(CONFIG_SOFTMMU)
-/* We're expecting to use a 20-bit negative offset on the tlb memory ops.  */
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -(1 << 19));
-
-/* Load and compare a TLB entry, leaving the flags set.  Loads the TLB
-   addend into R2.  Returns a register with the santitized guest address.  */
-static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, MemOp opc,
-   int mem_index, bool is_ld)
-{
-unsigned s_bits = opc & MO_SIZE;
-unsigned a_bits = get_alignment_bits(opc);
-unsigned s_mask = (1 << s_bits) - 1;
-unsigned a_mask = (1 << a_bits) - 1;
-int fast_off = TLB_MASK_TABLE_OFS(mem_index);
-int mask_off = fast_off + offsetof(CPUTLBDescFast, mask);
-int table_off = fast_off + offsetof(CPUTLBDescFast, table);
-int ofs, a_off;
-uint64_t tlb_mask;
-
-tcg_out_sh64(s, RSY_SRLG, TCG_REG_R2, addr_reg, TCG_REG_NONE,
- TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-tcg_out_insn(s, RXY, NG, TCG_REG_R2, TCG_AREG0, TCG_REG_NONE, mask_off);
-tcg_out_insn(s, RXY, AG, TCG_REG_R2, TCG_AREG0, TCG_REG_NONE, table_off);
-
-/* For aligned accesses, we check the first byte and include the alignment
-   bits within the address.  For unaligned access, we check that we don't
-   cross pages using the address of the last byte of the access.  */
-a_off = (a_bits >= s_bits ? 0 : s_mask - a_mask);
-tlb_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
-if (a_off == 0) {
-tgen_andi_risbg(s, TCG_REG_R3, addr_reg, tlb_mask);
-} else {
-tcg_out_insn(s, RX, LA, TCG_REG_R3, addr_reg, TCG_REG_NONE, a_off);
-tgen_andi(s, TCG_TYPE_TL, TCG_REG_R3, tlb_mask);
-}
-
-if (is_ld) {
-ofs = offsetof(CPUTLBEntry, addr_read);
-} else {
-ofs = offsetof(CPUTLBEntry, addr_write);
-}
-if (TARGET_LONG_BITS == 32) {
-tcg_out_insn(s, RX, C, TCG_REG_R3, TCG_REG_R2, TCG_REG_NONE, ofs);
-} else {
-tcg_out_insn(s, RXY, CG, TCG_REG_R3, TCG_REG_R2, TCG_REG_NONE, ofs);
-}
-
-tcg_out_insn(s, RXY, LG, TCG_REG_R2, TCG_REG_R2, TCG_REG_NONE,
- offsetof(CPUTLBEntry, addend));
-
-if (TARGET_LONG_BITS == 32) {
-tcg_out_ext32u(s, TCG_REG_R3, addr_reg);
-return TCG_REG_R3;
-}
-return addr_reg;
-}
-
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, MemOpIdx oi,
-TCGType type, TCGReg data, TCGReg addr,
-tcg_insn_unit *raddr, tcg_insn_unit *label_ptr)
-{
-TCGLabelQemuLdst *label = new_ldst_label(s);
-
-label->is_ld = is_ld;
-label->oi = oi;
-label->type = type;
-label->datalo_reg = data;
-label->addrlo_reg = addr;
-label->raddr = tcg_splitwx_to_rx(raddr);
-label->label_ptr[0] = label_ptr;
-}
-
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
 TCGReg addr_reg = lb->addrlo_reg;
@@ -1842,26 +1770,6 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 return true;
 }
 #else
-static void tcg_out_test_alignment(TCGContext *s, bool is_ld,
-   TCGReg addrlo, unsigned a_bits)
-{
-unsigned a_mask = (1 << a_bits) - 1;
-TCGLabelQemuLdst *l = new_ldst_label(s);
-
-l->is_ld = is_ld;
-l->addrlo_reg = addrlo;
-
-/* We are expecting a_bits to max out at 7, much lower than TMLL. */
-tcg_debug_assert(a_bits < 16);
-tcg_out_insn(s, RI, TMLL, addrlo, a_mask);
-
-tcg_out16(s, RI_BRC | (7 << 4)); /* CC in {1,2,3} */
-l->label_ptr[0] = s->code_ptr;
-s->code_ptr += 1;
-
-l->raddr = tcg_splitwx_to_rx(s->code_ptr);
-}
-
 static bool tcg_out_fail_alignment(TCGContext *s, TCGLabelQemuLdst *l)
 {
 if (!patch_reloc(l->label_ptr[0], R_390_PC16DBL,
@@ -1888,91 +1796,146 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 {
 return tcg_out_fail_alignment(s, l);
 }
+#endif /* CONFIG_SOFTMMU */
 
-static HostAddress tcg_prepare_user_ldst(TCGContext *s, TCGReg addr_reg)
+/*
+ * For softmmu, perform the TLB load and compare.
+ * For useronly, perform any required alignment tests.
+ * In both cases, return a TCGLabelQemuLdst structure if the slow path
+ * is required 

[PATCH v5 15/30] tcg/loongarch64: Convert tcg_out_qemu_{ld, st}_slow_path

2023-05-06 Thread Richard Henderson
Use tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.

Signed-off-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 37 ++--
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 2f2c34b930..60d2c904dd 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -824,51 +824,36 @@ static bool tcg_out_goto(TCGContext *s, const 
tcg_insn_unit *target)
 return reloc_br_sd10k16(s->code_ptr - 1, target);
 }
 
+static const TCGLdstHelperParam ldst_helper_param = {
+.ntmp = 1, .tmp = { TCG_REG_TMP0 }
+};
+
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-MemOpIdx oi = l->oi;
-MemOp opc = get_memop(oi);
-MemOp size = opc & MO_SIZE;
+MemOp opc = get_memop(l->oi);
 
 /* resolve label address */
 if (!reloc_br_sk16(l->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
 return false;
 }
 
-/* call load helper */
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_A0, TCG_AREG0);
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_A1, l->addrlo_reg);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_A2, oi);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_A3, (tcg_target_long)l->raddr);
-
-tcg_out_call_int(s, qemu_ld_helpers[size], false);
-
-tcg_out_movext(s, l->type, l->datalo_reg,
-   TCG_TYPE_REG, opc & MO_SSIZE, TCG_REG_A0);
+tcg_out_ld_helper_args(s, l, &ldst_helper_param);
+tcg_out_call_int(s, qemu_ld_helpers[opc & MO_SIZE], false);
+tcg_out_ld_helper_ret(s, l, false, &ldst_helper_param);
 return tcg_out_goto(s, l->raddr);
 }
 
 static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-MemOpIdx oi = l->oi;
-MemOp opc = get_memop(oi);
-MemOp size = opc & MO_SIZE;
+MemOp opc = get_memop(l->oi);
 
 /* resolve label address */
 if (!reloc_br_sk16(l->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
 return false;
 }
 
-/* call store helper */
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_A0, TCG_AREG0);
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_A1, l->addrlo_reg);
-tcg_out_movext(s, size == MO_64 ? TCG_TYPE_I32 : TCG_TYPE_I32, TCG_REG_A2,
-   l->type, size, l->datalo_reg);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_A3, oi);
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_A4, (tcg_target_long)l->raddr);
-
-tcg_out_call_int(s, qemu_st_helpers[size], false);
-
+tcg_out_st_helper_args(s, l, &ldst_helper_param);
+tcg_out_call_int(s, qemu_st_helpers[opc & MO_SIZE], false);
 return tcg_out_goto(s, l->raddr);
 }
 #else
-- 
2.34.1




[PATCH v5 21/30] tcg/mips: Remove MO_BSWAP handling

2023-05-06 Thread Richard Henderson
While performing the load in the delay slot of the call to the common
bswap helper function is cute, it is not worth the added complexity.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.h |   4 +-
 tcg/mips/tcg-target.c.inc | 284 ++
 2 files changed, 48 insertions(+), 240 deletions(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 2431fc5353..42bd7fff01 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -204,8 +204,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_ext16u_i64   0 /* andi rt, rs, 0x */
 #endif
 
-#define TCG_TARGET_DEFAULT_MO (0)
-#define TCG_TARGET_HAS_MEMORY_BSWAP 1
+#define TCG_TARGET_DEFAULT_MO   0
+#define TCG_TARGET_HAS_MEMORY_BSWAP 0
 
 #define TCG_TARGET_NEED_LDST_LABELS
 
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 022960d79a..31d58e1977 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -1088,31 +1088,35 @@ static void tcg_out_call(TCGContext *s, const 
tcg_insn_unit *arg,
 }
 
 #if defined(CONFIG_SOFTMMU)
-static void * const qemu_ld_helpers[(MO_SSIZE | MO_BSWAP) + 1] = {
+static void * const qemu_ld_helpers[MO_SSIZE + 1] = {
 [MO_UB]   = helper_ret_ldub_mmu,
 [MO_SB]   = helper_ret_ldsb_mmu,
-[MO_LEUW] = helper_le_lduw_mmu,
-[MO_LESW] = helper_le_ldsw_mmu,
-[MO_LEUL] = helper_le_ldul_mmu,
-[MO_LEUQ] = helper_le_ldq_mmu,
-[MO_BEUW] = helper_be_lduw_mmu,
-[MO_BESW] = helper_be_ldsw_mmu,
-[MO_BEUL] = helper_be_ldul_mmu,
-[MO_BEUQ] = helper_be_ldq_mmu,
-#if TCG_TARGET_REG_BITS == 64
-[MO_LESL] = helper_le_ldsl_mmu,
-[MO_BESL] = helper_be_ldsl_mmu,
+#if HOST_BIG_ENDIAN
+[MO_UW] = helper_be_lduw_mmu,
+[MO_SW] = helper_be_ldsw_mmu,
+[MO_UL] = helper_be_ldul_mmu,
+[MO_SL] = helper_be_ldsl_mmu,
+[MO_UQ] = helper_be_ldq_mmu,
+#else
+[MO_UW] = helper_le_lduw_mmu,
+[MO_SW] = helper_le_ldsw_mmu,
+[MO_UL] = helper_le_ldul_mmu,
+[MO_UQ] = helper_le_ldq_mmu,
+[MO_SL] = helper_le_ldsl_mmu,
 #endif
 };
 
-static void * const qemu_st_helpers[(MO_SIZE | MO_BSWAP) + 1] = {
+static void * const qemu_st_helpers[MO_SIZE + 1] = {
 [MO_UB]   = helper_ret_stb_mmu,
-[MO_LEUW] = helper_le_stw_mmu,
-[MO_LEUL] = helper_le_stl_mmu,
-[MO_LEUQ] = helper_le_stq_mmu,
-[MO_BEUW] = helper_be_stw_mmu,
-[MO_BEUL] = helper_be_stl_mmu,
-[MO_BEUQ] = helper_be_stq_mmu,
+#if HOST_BIG_ENDIAN
+[MO_UW] = helper_be_stw_mmu,
+[MO_UL] = helper_be_stl_mmu,
+[MO_UQ] = helper_be_stq_mmu,
+#else
+[MO_UW] = helper_le_stw_mmu,
+[MO_UL] = helper_le_stl_mmu,
+[MO_UQ] = helper_le_stq_mmu,
+#endif
 };
 
 /* We have four temps, we might as well expose three of them. */
@@ -1134,7 +1138,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 
 tcg_out_ld_helper_args(s, l, &ldst_helper_param);
 
-tcg_out_call_int(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)], false);
+tcg_out_call_int(s, qemu_ld_helpers[opc & MO_SSIZE], false);
 /* delay slot */
 tcg_out_nop(s);
 
@@ -1164,7 +1168,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 
 tcg_out_st_helper_args(s, l, &ldst_helper_param);
 
-tcg_out_call_int(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)], false);
+tcg_out_call_int(s, qemu_st_helpers[opc & MO_SIZE], false);
 /* delay slot */
 tcg_out_nop(s);
 
@@ -1379,52 +1383,19 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg lo, TCGReg hi,
TCGReg base, MemOp opc, TCGType type)
 {
-switch (opc & (MO_SSIZE | MO_BSWAP)) {
+switch (opc & MO_SSIZE) {
 case MO_UB:
 tcg_out_opc_imm(s, OPC_LBU, lo, base, 0);
 break;
 case MO_SB:
 tcg_out_opc_imm(s, OPC_LB, lo, base, 0);
 break;
-case MO_UW | MO_BSWAP:
-tcg_out_opc_imm(s, OPC_LHU, TCG_TMP1, base, 0);
-tcg_out_bswap16(s, lo, TCG_TMP1, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-break;
 case MO_UW:
 tcg_out_opc_imm(s, OPC_LHU, lo, base, 0);
 break;
-case MO_SW | MO_BSWAP:
-tcg_out_opc_imm(s, OPC_LHU, TCG_TMP1, base, 0);
-tcg_out_bswap16(s, lo, TCG_TMP1, TCG_BSWAP_IZ | TCG_BSWAP_OS);
-break;
 case MO_SW:
 tcg_out_opc_imm(s, OPC_LH, lo, base, 0);
 break;
-case MO_UL | MO_BSWAP:
-if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I64) {
-if (use_mips32r2_instructions) {
-tcg_out_opc_imm(s, OPC_LWU, lo, base, 0);
-tcg_out_bswap32(s, lo, lo, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-} else {
-tcg_out_bswap_subr(s, bswap32u_addr);
-/* delay slot */
-tcg_out_opc_imm(s, OPC_LWU, TCG_TMP0, base, 0);
-tcg_out_mov(s, TCG_TYPE_I64, lo, TCG_TM

[PATCH v5 10/30] tcg: Add routines for calling slow-path helpers

2023-05-06 Thread Richard Henderson
Add tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.  These and their subroutines
use the existing knowledge of the host function call abi
to load the function call arguments and return results.

These will be used to simplify the backends in turn.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 456 +-
 1 file changed, 453 insertions(+), 3 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 057423c121..748be8426a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -181,6 +181,22 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct);
 static int tcg_out_ldst_finalize(TCGContext *s);
 #endif
 
+typedef struct TCGLdstHelperParam {
+TCGReg (*ra_gen)(TCGContext *s, const TCGLabelQemuLdst *l, int arg_reg);
+unsigned ntmp;
+int tmp[3];
+} TCGLdstHelperParam;
+
+static void tcg_out_ld_helper_args(TCGContext *s, const TCGLabelQemuLdst *l,
+   const TCGLdstHelperParam *p)
+__attribute__((unused));
+static void tcg_out_ld_helper_ret(TCGContext *s, const TCGLabelQemuLdst *l,
+  bool load_sign, const TCGLdstHelperParam *p)
+__attribute__((unused));
+static void tcg_out_st_helper_args(TCGContext *s, const TCGLabelQemuLdst *l,
+   const TCGLdstHelperParam *p)
+__attribute__((unused));
+
 TCGContext tcg_init_ctx;
 __thread TCGContext *tcg_ctx;
 
@@ -459,9 +475,8 @@ static void tcg_out_movext1(TCGContext *s, const 
TCGMovExtend *i)
  * between the sources and destinations.
  */
 
-static void __attribute__((unused))
-tcg_out_movext2(TCGContext *s, const TCGMovExtend *i1,
-const TCGMovExtend *i2, int scratch)
+static void tcg_out_movext2(TCGContext *s, const TCGMovExtend *i1,
+const TCGMovExtend *i2, int scratch)
 {
 TCGReg src1 = i1->src;
 TCGReg src2 = i2->src;
@@ -715,6 +730,50 @@ static TCGHelperInfo all_helpers[] = {
 };
 static GHashTable *helper_table;
 
+#if TCG_TARGET_REG_BITS == 32
+# define dh_typecode_ttl  dh_typecode_i32
+#else
+# define dh_typecode_ttl  dh_typecode_i64
+#endif
+
+static TCGHelperInfo info_helper_ld32_mmu = {
+.flags = TCG_CALL_NO_WG,
+.typemask = dh_typemask(ttl, 0)  /* return tcg_target_ulong */
+  | dh_typemask(env, 1)
+  | dh_typemask(tl, 2)   /* target_ulong addr */
+  | dh_typemask(i32, 3)  /* unsigned oi */
+  | dh_typemask(ptr, 4)  /* uintptr_t ra */
+};
+
+static TCGHelperInfo info_helper_ld64_mmu = {
+.flags = TCG_CALL_NO_WG,
+.typemask = dh_typemask(i64, 0)  /* return uint64_t */
+  | dh_typemask(env, 1)
+  | dh_typemask(tl, 2)   /* target_ulong addr */
+  | dh_typemask(i32, 3)  /* unsigned oi */
+  | dh_typemask(ptr, 4)  /* uintptr_t ra */
+};
+
+static TCGHelperInfo info_helper_st32_mmu = {
+.flags = TCG_CALL_NO_WG,
+.typemask = dh_typemask(void, 0)
+  | dh_typemask(env, 1)
+  | dh_typemask(tl, 2)   /* target_ulong addr */
+  | dh_typemask(i32, 3)  /* uint32_t data */
+  | dh_typemask(i32, 4)  /* unsigned oi */
+  | dh_typemask(ptr, 5)  /* uintptr_t ra */
+};
+
+static TCGHelperInfo info_helper_st64_mmu = {
+.flags = TCG_CALL_NO_WG,
+.typemask = dh_typemask(void, 0)
+  | dh_typemask(env, 1)
+  | dh_typemask(tl, 2)   /* target_ulong addr */
+  | dh_typemask(i64, 3)  /* uint64_t data */
+  | dh_typemask(i32, 4)  /* unsigned oi */
+  | dh_typemask(ptr, 5)  /* uintptr_t ra */
+};
+
 #ifdef CONFIG_TCG_INTERPRETER
 static ffi_type *typecode_to_ffi(int argmask)
 {
@@ -1126,6 +1185,11 @@ static void tcg_context_init(unsigned max_cpus)
 (gpointer)&all_helpers[i]);
 }
 
+init_call_layout(&info_helper_ld32_mmu);
+init_call_layout(&info_helper_ld64_mmu);
+init_call_layout(&info_helper_st32_mmu);
+init_call_layout(&info_helper_st64_mmu);
+
 #ifdef CONFIG_TCG_INTERPRETER
 init_ffi_layouts();
 #endif
@@ -5011,6 +5075,392 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 }
 }
 
+/*
+ * Similarly for qemu_ld/st slow path helpers.
+ * We must re-implement tcg_gen_callN and tcg_reg_alloc_call simultaneously,
+ * using only the provided backend tcg_out_* functions.
+ */
+
+static int tcg_out_helper_stk_ofs(TCGType type, unsigned slot)
+{
+int ofs = arg_slot_stk_ofs(slot);
+
+/*
+ * Each stack slot is TCG_TARGET_LONG_BITS.  If the host does not
+ * require extension to uint64_t, adjust the address for uint32_t.
+ */
+if (HOST_BIG_ENDIAN &&
+TCG_TARGET_REG_BITS == 64 &&
+type == TCG_TYPE_I32) {
+ofs += 4;
+}
+return ofs;
+}
+
+static void tcg_out_helper_load_regs(TCGContext *s,
+ unsigned nmov, TCGMovExtend *mov,
+

[PATCH v5 17/30] tcg/ppc: Convert tcg_out_qemu_{ld,st}_slow_path

2023-05-06 Thread Richard Henderson
Use tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 88 
 1 file changed, 26 insertions(+), 62 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 7239335bdf..042136fee7 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2003,44 +2003,38 @@ static void * const qemu_st_helpers[(MO_SIZE | 
MO_BSWAP) + 1] = {
 [MO_BEUQ] = helper_be_stq_mmu,
 };
 
+static TCGReg ldst_ra_gen(TCGContext *s, const TCGLabelQemuLdst *l, int arg)
+{
+if (arg < 0) {
+arg = TCG_REG_TMP1;
+}
+tcg_out32(s, MFSPR | RT(arg) | LR);
+return arg;
+}
+
+/*
+ * For the purposes of ppc32 sorting 4 input registers into 4 argument
+ * registers, there is an outside chance we would require 3 temps.
+ * Because of constraints, no inputs are in r3, and env will not be
+ * placed into r3 until after the sorting is done, and is thus free.
+ */
+static const TCGLdstHelperParam ldst_helper_param = {
+.ra_gen = ldst_ra_gen,
+.ntmp = 3,
+.tmp = { TCG_REG_TMP1, TCG_REG_R0, TCG_REG_R3 }
+};
+
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-MemOpIdx oi = lb->oi;
-MemOp opc = get_memop(oi);
-TCGReg hi, lo, arg = TCG_REG_R3;
+MemOp opc = get_memop(lb->oi);
 
 if (!reloc_pc14(lb->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
 return false;
 }
 
-tcg_out_mov(s, TCG_TYPE_PTR, arg++, TCG_AREG0);
-
-lo = lb->addrlo_reg;
-hi = lb->addrhi_reg;
-if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-arg |= (TCG_TARGET_CALL_ARG_I64 == TCG_CALL_ARG_EVEN);
-tcg_out_mov(s, TCG_TYPE_I32, arg++, hi);
-tcg_out_mov(s, TCG_TYPE_I32, arg++, lo);
-} else {
-/* If the address needed to be zero-extended, we'll have already
-   placed it in R4.  The only remaining case is 64-bit guest.  */
-tcg_out_mov(s, TCG_TYPE_TL, arg++, lo);
-}
-
-tcg_out_movi(s, TCG_TYPE_I32, arg++, oi);
-tcg_out32(s, MFSPR | RT(arg) | LR);
-
+tcg_out_ld_helper_args(s, lb, &ldst_helper_param);
 tcg_out_call_int(s, LK, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
-
-lo = lb->datalo_reg;
-hi = lb->datahi_reg;
-if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
-tcg_out_mov(s, TCG_TYPE_I32, lo, TCG_REG_R4);
-tcg_out_mov(s, TCG_TYPE_I32, hi, TCG_REG_R3);
-} else {
-tcg_out_movext(s, lb->type, lo,
-   TCG_TYPE_REG, opc & MO_SSIZE, TCG_REG_R3);
-}
+tcg_out_ld_helper_ret(s, lb, false, &ldst_helper_param);
 
 tcg_out_b(s, 0, lb->raddr);
 return true;
@@ -2048,43 +2042,13 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 
 static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-MemOpIdx oi = lb->oi;
-MemOp opc = get_memop(oi);
-MemOp s_bits = opc & MO_SIZE;
-TCGReg hi, lo, arg = TCG_REG_R3;
+MemOp opc = get_memop(lb->oi);
 
 if (!reloc_pc14(lb->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
 return false;
 }
 
-tcg_out_mov(s, TCG_TYPE_PTR, arg++, TCG_AREG0);
-
-lo = lb->addrlo_reg;
-hi = lb->addrhi_reg;
-if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-arg |= (TCG_TARGET_CALL_ARG_I64 == TCG_CALL_ARG_EVEN);
-tcg_out_mov(s, TCG_TYPE_I32, arg++, hi);
-tcg_out_mov(s, TCG_TYPE_I32, arg++, lo);
-} else {
-/* If the address needed to be zero-extended, we'll have already
-   placed it in R4.  The only remaining case is 64-bit guest.  */
-tcg_out_mov(s, TCG_TYPE_TL, arg++, lo);
-}
-
-lo = lb->datalo_reg;
-hi = lb->datahi_reg;
-if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
-arg |= (TCG_TARGET_CALL_ARG_I64 == TCG_CALL_ARG_EVEN);
-tcg_out_mov(s, TCG_TYPE_I32, arg++, hi);
-tcg_out_mov(s, TCG_TYPE_I32, arg++, lo);
-} else {
-tcg_out_movext(s, s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32,
-   arg++, lb->type, s_bits, lo);
-}
-
-tcg_out_movi(s, TCG_TYPE_I32, arg++, oi);
-tcg_out32(s, MFSPR | RT(arg) | LR);
-
+tcg_out_st_helper_args(s, lb, &ldst_helper_param);
 tcg_out_call_int(s, LK, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
 
 tcg_out_b(s, 0, lb->raddr);
-- 
2.34.1




[PATCH v5 16/30] tcg/mips: Convert tcg_out_qemu_{ld,st}_slow_path

2023-05-06 Thread Richard Henderson
Use tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.  This allows our local
tcg_out_arg_* infrastructure to be removed.

We are no longer filling the call or return branch
delay slots, nor are we tail-calling for the store,
but this seems a small price to pay.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 154 ++
 1 file changed, 22 insertions(+), 132 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 94708e6ea7..022960d79a 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -1115,79 +1115,15 @@ static void * const qemu_st_helpers[(MO_SIZE | 
MO_BSWAP) + 1] = {
 [MO_BEUQ] = helper_be_stq_mmu,
 };
 
-/* Helper routines for marshalling helper function arguments into
- * the correct registers and stack.
- * I is where we want to put this argument, and is updated and returned
- * for the next call. ARG is the argument itself.
- *
- * We provide routines for arguments which are: immediate, 32 bit
- * value in register, 16 and 8 bit values in register (which must be zero
- * extended before use) and 64 bit value in a lo:hi register pair.
- */
-
-static int tcg_out_call_iarg_reg(TCGContext *s, int i, TCGReg arg)
-{
-if (i < ARRAY_SIZE(tcg_target_call_iarg_regs)) {
-tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[i], arg);
-} else {
-/* For N32 and N64, the initial offset is different.  But there
-   we also have 8 argument register so we don't run out here.  */
-tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
-tcg_out_st(s, TCG_TYPE_REG, arg, TCG_REG_SP, 4 * i);
-}
-return i + 1;
-}
-
-static int tcg_out_call_iarg_reg8(TCGContext *s, int i, TCGReg arg)
-{
-TCGReg tmp = TCG_TMP0;
-if (i < ARRAY_SIZE(tcg_target_call_iarg_regs)) {
-tmp = tcg_target_call_iarg_regs[i];
-}
-tcg_out_ext8u(s, tmp, arg);
-return tcg_out_call_iarg_reg(s, i, tmp);
-}
-
-static int tcg_out_call_iarg_reg16(TCGContext *s, int i, TCGReg arg)
-{
-TCGReg tmp = TCG_TMP0;
-if (i < ARRAY_SIZE(tcg_target_call_iarg_regs)) {
-tmp = tcg_target_call_iarg_regs[i];
-}
-tcg_out_opc_imm(s, OPC_ANDI, tmp, arg, 0x);
-return tcg_out_call_iarg_reg(s, i, tmp);
-}
-
-static int tcg_out_call_iarg_imm(TCGContext *s, int i, TCGArg arg)
-{
-TCGReg tmp = TCG_TMP0;
-if (arg == 0) {
-tmp = TCG_REG_ZERO;
-} else {
-if (i < ARRAY_SIZE(tcg_target_call_iarg_regs)) {
-tmp = tcg_target_call_iarg_regs[i];
-}
-tcg_out_movi(s, TCG_TYPE_REG, tmp, arg);
-}
-return tcg_out_call_iarg_reg(s, i, tmp);
-}
-
-static int tcg_out_call_iarg_reg2(TCGContext *s, int i, TCGReg al, TCGReg ah)
-{
-tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
-i = (i + 1) & ~1;
-i = tcg_out_call_iarg_reg(s, i, (MIPS_BE ? ah : al));
-i = tcg_out_call_iarg_reg(s, i, (MIPS_BE ? al : ah));
-return i;
-}
+/* We have four temps, we might as well expose three of them. */
+static const TCGLdstHelperParam ldst_helper_param = {
+.ntmp = 3, .tmp = { TCG_TMP0, TCG_TMP1, TCG_TMP2 }
+};
 
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
 const tcg_insn_unit *tgt_rx = tcg_splitwx_to_rx(s->code_ptr);
-MemOpIdx oi = l->oi;
-MemOp opc = get_memop(oi);
-TCGReg v0;
-int i;
+MemOp opc = get_memop(l->oi);
 
 /* resolve label address */
 if (!reloc_pc16(l->label_ptr[0], tgt_rx)
@@ -1196,29 +1132,13 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 return false;
 }
 
-i = 1;
-if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-i = tcg_out_call_iarg_reg2(s, i, l->addrlo_reg, l->addrhi_reg);
-} else {
-i = tcg_out_call_iarg_reg(s, i, l->addrlo_reg);
-}
-i = tcg_out_call_iarg_imm(s, i, oi);
-i = tcg_out_call_iarg_imm(s, i, (intptr_t)l->raddr);
+tcg_out_ld_helper_args(s, l, &ldst_helper_param);
+
 tcg_out_call_int(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)], false);
 /* delay slot */
-tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+tcg_out_nop(s);
 
-v0 = l->datalo_reg;
-if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
-/* We eliminated V0 from the possible output registers, so it
-   cannot be clobbered here.  So we must move V1 first.  */
-if (MIPS_BE) {
-tcg_out_mov(s, TCG_TYPE_I32, v0, TCG_REG_V1);
-v0 = l->datahi_reg;
-} else {
-tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_V1);
-}
-}
+tcg_out_ld_helper_ret(s, l, true, &ldst_helper_param);
 
 tcg_out_opc_br(s, OPC_BEQ, TCG_REG_ZERO, TCG_REG_ZERO);
 if (!reloc_pc16(s->code_ptr - 1, l->raddr)) {
@@ -1226,22 +1146,14 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 }
 
 /* delay slot */
-if (TCG_TARGET_R

[PATCH v5 12/30] tcg/i386: Convert tcg_out_qemu_st_slow_path

2023-05-06 Thread Richard Henderson
Use tcg_out_st_helper_args.  This eliminates the use of a tail call to
the store helper.  This may or may not be an improvement, depending on
the call/return branch prediction of the host microarchitecture.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 57 +++
 1 file changed, 4 insertions(+), 53 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 17ad3c5963..7dbfcbd20f 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1854,11 +1854,8 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
  */
 static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-MemOpIdx oi = l->oi;
-MemOp opc = get_memop(oi);
-MemOp s_bits = opc & MO_SIZE;
+MemOp opc = get_memop(l->oi);
 tcg_insn_unit **label_ptr = &l->label_ptr[0];
-TCGReg retaddr;
 
 /* resolve label address */
 tcg_patch32(label_ptr[0], s->code_ptr - label_ptr[0] - 4);
@@ -1866,56 +1863,10 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 tcg_patch32(label_ptr[1], s->code_ptr - label_ptr[1] - 4);
 }
 
-if (TCG_TARGET_REG_BITS == 32) {
-int ofs = 0;
+tcg_out_st_helper_args(s, l, &ldst_helper_param);
+tcg_out_branch(s, 1, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
 
-tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
-ofs += 4;
-
-tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
-ofs += 4;
-
-if (TARGET_LONG_BITS == 64) {
-tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
-ofs += 4;
-}
-
-tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
-ofs += 4;
-
-if (s_bits == MO_64) {
-tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
-ofs += 4;
-}
-
-tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
-ofs += 4;
-
-retaddr = TCG_REG_EAX;
-tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP, ofs);
-} else {
-tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-tcg_out_mov(s, TCG_TYPE_TL, tcg_target_call_iarg_regs[1],
-l->addrlo_reg);
-tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
-tcg_target_call_iarg_regs[2], l->datalo_reg);
-tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3], oi);
-
-if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
-retaddr = tcg_target_call_iarg_regs[4];
-tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-} else {
-retaddr = TCG_REG_RAX;
-tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP,
-   TCG_TARGET_CALL_STACK_OFFSET);
-}
-}
-
-/* "Tail call" to the helper, with the return address back inline.  */
-tcg_out_push(s, retaddr);
-tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+tcg_out_jmp(s, l->raddr);
 return true;
 }
 #else
-- 
2.34.1




[PATCH v5 13/30] tcg/aarch64: Convert tcg_out_qemu_{ld,st}_slow_path

2023-05-06 Thread Richard Henderson
Use tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.c.inc | 40 +++-
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 202b90c001..62dd22d73c 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1580,13 +1580,6 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, 
TCGReg d,
 }
 }
 
-static void tcg_out_adr(TCGContext *s, TCGReg rd, const void *target)
-{
-ptrdiff_t offset = tcg_pcrel_diff(s, target);
-tcg_debug_assert(offset == sextract64(offset, 0, 21));
-tcg_out_insn(s, 3406, ADR, rd, offset);
-}
-
 typedef struct {
 TCGReg base;
 TCGReg index;
@@ -1627,47 +1620,46 @@ static void * const qemu_st_helpers[MO_SIZE + 1] = {
 #endif
 };
 
+static const TCGLdstHelperParam ldst_helper_param = {
+.ntmp = 1, .tmp = { TCG_REG_TMP }
+};
+
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-MemOpIdx oi = lb->oi;
-MemOp opc = get_memop(oi);
+MemOp opc = get_memop(lb->oi);
 
 if (!reloc_pc19(lb->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
 return false;
 }
 
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
-tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
-tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, oi);
-tcg_out_adr(s, TCG_REG_X3, lb->raddr);
+tcg_out_ld_helper_args(s, lb, &ldst_helper_param);
 tcg_out_call_int(s, qemu_ld_helpers[opc & MO_SIZE]);
-
-tcg_out_movext(s, lb->type, lb->datalo_reg,
-   TCG_TYPE_REG, opc & MO_SSIZE, TCG_REG_X0);
+tcg_out_ld_helper_ret(s, lb, false, &ldst_helper_param);
 tcg_out_goto(s, lb->raddr);
 return true;
 }
 
 static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-MemOpIdx oi = lb->oi;
-MemOp opc = get_memop(oi);
-MemOp size = opc & MO_SIZE;
+MemOp opc = get_memop(lb->oi);
 
 if (!reloc_pc19(lb->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
 return false;
 }
 
-tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
-tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
-tcg_out_mov(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
-tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, oi);
-tcg_out_adr(s, TCG_REG_X4, lb->raddr);
+tcg_out_st_helper_args(s, lb, &ldst_helper_param);
 tcg_out_call_int(s, qemu_st_helpers[opc & MO_SIZE]);
 tcg_out_goto(s, lb->raddr);
 return true;
 }
 #else
+static void tcg_out_adr(TCGContext *s, TCGReg rd, const void *target)
+{
+ptrdiff_t offset = tcg_pcrel_diff(s, target);
+tcg_debug_assert(offset == sextract64(offset, 0, 21));
+tcg_out_insn(s, 3406, ADR, rd, offset);
+}
+
 static bool tcg_out_fail_alignment(TCGContext *s, TCGLabelQemuLdst *l)
 {
 if (!reloc_pc19(l->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
-- 
2.34.1




[PATCH v5 04/30] tcg/arm: Introduce prepare_host_addr

2023-05-06 Thread Richard Henderson
Merge tcg_out_tlb_load, add_qemu_ldst_label, and some code that lived
in both tcg_out_qemu_ld and tcg_out_qemu_st into one function that
returns HostAddress and TCGLabelQemuLdst structures.

Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.c.inc | 351 ++-
 1 file changed, 159 insertions(+), 192 deletions(-)

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index b6b4ffc546..c744512778 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1434,125 +1434,6 @@ static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg 
argreg,
 }
 }
 
-#define TLB_SHIFT  (CPU_TLB_ENTRY_BITS + CPU_TLB_BITS)
-
-/* We expect to use an 9-bit sign-magnitude negative offset from ENV.  */
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -256);
-
-/* These offsets are built into the LDRD below.  */
-QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, mask) != 0);
-QEMU_BUILD_BUG_ON(offsetof(CPUTLBDescFast, table) != 4);
-
-/* Load and compare a TLB entry, leaving the flags set.  Returns the register
-   containing the addend of the tlb entry.  Clobbers R0, R1, R2, TMP.  */
-
-static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-   MemOp opc, int mem_index, bool is_load)
-{
-int cmp_off = (is_load ? offsetof(CPUTLBEntry, addr_read)
-   : offsetof(CPUTLBEntry, addr_write));
-int fast_off = TLB_MASK_TABLE_OFS(mem_index);
-unsigned s_mask = (1 << (opc & MO_SIZE)) - 1;
-unsigned a_mask = (1 << get_alignment_bits(opc)) - 1;
-TCGReg t_addr;
-
-/* Load env_tlb(env)->f[mmu_idx].{mask,table} into {r0,r1}.  */
-tcg_out_ldrd_8(s, COND_AL, TCG_REG_R0, TCG_AREG0, fast_off);
-
-/* Extract the tlb index from the address into R0.  */
-tcg_out_dat_reg(s, COND_AL, ARITH_AND, TCG_REG_R0, TCG_REG_R0, addrlo,
-SHIFT_IMM_LSR(TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS));
-
-/*
- * Add the tlb_table pointer, creating the CPUTLBEntry address in R1.
- * Load the tlb comparator into R2/R3 and the fast path addend into R1.
- */
-if (cmp_off == 0) {
-if (TARGET_LONG_BITS == 64) {
-tcg_out_ldrd_rwb(s, COND_AL, TCG_REG_R2, TCG_REG_R1, TCG_REG_R0);
-} else {
-tcg_out_ld32_rwb(s, COND_AL, TCG_REG_R2, TCG_REG_R1, TCG_REG_R0);
-}
-} else {
-tcg_out_dat_reg(s, COND_AL, ARITH_ADD,
-TCG_REG_R1, TCG_REG_R1, TCG_REG_R0, 0);
-if (TARGET_LONG_BITS == 64) {
-tcg_out_ldrd_8(s, COND_AL, TCG_REG_R2, TCG_REG_R1, cmp_off);
-} else {
-tcg_out_ld32_12(s, COND_AL, TCG_REG_R2, TCG_REG_R1, cmp_off);
-}
-}
-
-/* Load the tlb addend.  */
-tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R1,
-offsetof(CPUTLBEntry, addend));
-
-/*
- * Check alignment, check comparators.
- * Do this in 2-4 insns.  Use MOVW for v7, if possible,
- * to reduce the number of sequential conditional instructions.
- * Almost all guests have at least 4k pages, which means that we need
- * to clear at least 9 bits even for an 8-byte memory, which means it
- * isn't worth checking for an immediate operand for BIC.
- *
- * For unaligned accesses, test the page of the last unit of alignment.
- * This leaves the least significant alignment bits unchanged, and of
- * course must be zero.
- */
-t_addr = addrlo;
-if (a_mask < s_mask) {
-t_addr = TCG_REG_R0;
-tcg_out_dat_imm(s, COND_AL, ARITH_ADD, t_addr,
-addrlo, s_mask - a_mask);
-}
-if (use_armv7_instructions && TARGET_PAGE_BITS <= 16) {
-tcg_out_movi32(s, COND_AL, TCG_REG_TMP, ~(TARGET_PAGE_MASK | a_mask));
-tcg_out_dat_reg(s, COND_AL, ARITH_BIC, TCG_REG_TMP,
-t_addr, TCG_REG_TMP, 0);
-tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R2, TCG_REG_TMP, 0);
-} else {
-if (a_mask) {
-tcg_debug_assert(a_mask <= 0xff);
-tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, addrlo, a_mask);
-}
-tcg_out_dat_reg(s, COND_AL, ARITH_MOV, TCG_REG_TMP, 0, t_addr,
-SHIFT_IMM_LSR(TARGET_PAGE_BITS));
-tcg_out_dat_reg(s, (a_mask ? COND_EQ : COND_AL), ARITH_CMP,
-0, TCG_REG_R2, TCG_REG_TMP,
-SHIFT_IMM_LSL(TARGET_PAGE_BITS));
-}
-
-if (TARGET_LONG_BITS == 64) {
-tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, TCG_REG_R3, addrhi, 0);
-}
-
-return TCG_REG_R1;
-}
-
-/* Record the context of a call to the out of line helper code for the slow
-   path for a load or store, so that we can later generate the correct
-   helper code.  */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld,
-MemOpIdx oi, TCGType type,
-TCGReg datalo, TC

[PATCH v5 30/30] tcg/s390x: Simplify constraints on qemu_ld/st

2023-05-06 Thread Richard Henderson
Adjust the softmmu tlb to use R0+R1, not any of the normally available
registers.  Since we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target-con-set.h |  2 --
 tcg/s390x/tcg-target-con-str.h |  1 -
 tcg/s390x/tcg-target.c.inc | 36 --
 3 files changed, 12 insertions(+), 27 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 15f1c55103..ecc079bb6d 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -10,12 +10,10 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I1(r)
-C_O0_I2(L, L)
 C_O0_I2(r, r)
 C_O0_I2(r, ri)
 C_O0_I2(r, rA)
 C_O0_I2(v, r)
-C_O1_I1(r, L)
 C_O1_I1(r, r)
 C_O1_I1(v, r)
 C_O1_I1(v, v)
diff --git a/tcg/s390x/tcg-target-con-str.h b/tcg/s390x/tcg-target-con-str.h
index 6fa64a1ed6..25675b449e 100644
--- a/tcg/s390x/tcg-target-con-str.h
+++ b/tcg/s390x/tcg-target-con-str.h
@@ -9,7 +9,6 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
-REGS('L', ALL_GENERAL_REGS & ~SOFTMMU_RESERVE_REGS)
 REGS('v', ALL_VECTOR_REGS)
 REGS('o', 0x) /* odd numbered general regs */
 
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index dd13326670..aacbaf21d5 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -44,18 +44,6 @@
 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 16)
 #define ALL_VECTOR_REGS  MAKE_64BIT_MASK(32, 32)
 
-/*
- * For softmmu, we need to avoid conflicts with the first 3
- * argument registers to perform the tlb lookup, and to call
- * the helper function.
- */
-#ifdef CONFIG_SOFTMMU
-#define SOFTMMU_RESERVE_REGS MAKE_64BIT_MASK(TCG_REG_R2, 3)
-#else
-#define SOFTMMU_RESERVE_REGS 0
-#endif
-
-
 /* Several places within the instruction set 0 means "no register"
rather than TCG_REG_R0.  */
 #define TCG_REG_NONE0
@@ -1814,13 +1802,13 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 ldst->oi = oi;
 ldst->addrlo_reg = addr_reg;
 
-tcg_out_sh64(s, RSY_SRLG, TCG_REG_R2, addr_reg, TCG_REG_NONE,
+tcg_out_sh64(s, RSY_SRLG, TCG_TMP0, addr_reg, TCG_REG_NONE,
  TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
 
 QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
 QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -(1 << 19));
-tcg_out_insn(s, RXY, NG, TCG_REG_R2, TCG_AREG0, TCG_REG_NONE, mask_off);
-tcg_out_insn(s, RXY, AG, TCG_REG_R2, TCG_AREG0, TCG_REG_NONE, table_off);
+tcg_out_insn(s, RXY, NG, TCG_TMP0, TCG_AREG0, TCG_REG_NONE, mask_off);
+tcg_out_insn(s, RXY, AG, TCG_TMP0, TCG_AREG0, TCG_REG_NONE, table_off);
 
 /*
  * For aligned accesses, we check the first byte and include the alignment
@@ -1830,10 +1818,10 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 a_off = (a_bits >= s_bits ? 0 : s_mask - a_mask);
 tlb_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
 if (a_off == 0) {
-tgen_andi_risbg(s, TCG_REG_R3, addr_reg, tlb_mask);
+tgen_andi_risbg(s, TCG_REG_R0, addr_reg, tlb_mask);
 } else {
-tcg_out_insn(s, RX, LA, TCG_REG_R3, addr_reg, TCG_REG_NONE, a_off);
-tgen_andi(s, TCG_TYPE_TL, TCG_REG_R3, tlb_mask);
+tcg_out_insn(s, RX, LA, TCG_REG_R0, addr_reg, TCG_REG_NONE, a_off);
+tgen_andi(s, TCG_TYPE_TL, TCG_REG_R0, tlb_mask);
 }
 
 if (is_ld) {
@@ -1842,16 +1830,16 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 ofs = offsetof(CPUTLBEntry, addr_write);
 }
 if (TARGET_LONG_BITS == 32) {
-tcg_out_insn(s, RX, C, TCG_REG_R3, TCG_REG_R2, TCG_REG_NONE, ofs);
+tcg_out_insn(s, RX, C, TCG_REG_R0, TCG_TMP0, TCG_REG_NONE, ofs);
 } else {
-tcg_out_insn(s, RXY, CG, TCG_REG_R3, TCG_REG_R2, TCG_REG_NONE, ofs);
+tcg_out_insn(s, RXY, CG, TCG_REG_R0, TCG_TMP0, TCG_REG_NONE, ofs);
 }
 
 tcg_out16(s, RI_BRC | (S390_CC_NE << 4));
 ldst->label_ptr[0] = s->code_ptr++;
 
-h->index = TCG_REG_R2;
-tcg_out_insn(s, RXY, LG, h->index, TCG_REG_R2, TCG_REG_NONE,
+h->index = TCG_TMP0;
+tcg_out_insn(s, RXY, LG, h->index, TCG_TMP0, TCG_REG_NONE,
  offsetof(CPUTLBEntry, addend));
 
 if (TARGET_LONG_BITS == 32) {
@@ -3155,10 +3143,10 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_qemu_ld_i32:
 case INDEX_op_qemu_ld_i64:
-return C_O1_I1(r, L);
+return C_O1_I1(r, r);
 case INDEX_op_qemu_st_i64:
 case INDEX_op_qemu_st_i32:
-return C_O0_I2(L, L);
+return C_O0_I2(r, r);
 
 case INDEX_op_deposit_i32:
 case INDEX_op_deposit_i64:
-- 
2.34.1




[PATCH v5 28/30] tcg/riscv: Simplify constraints on qemu_ld/st

2023-05-06 Thread Richard Henderson
The softmmu tlb uses TCG_REG_TMP[0-2], not any of the normally available
registers.  Now that we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target-con-set.h |  2 --
 tcg/riscv/tcg-target-con-str.h |  1 -
 tcg/riscv/tcg-target.c.inc | 16 +++-
 3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index d4cff673b0..d8d3ac 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -10,10 +10,8 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I1(r)
-C_O0_I2(LZ, L)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
-C_O1_I1(r, L)
 C_O1_I1(r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index 8d8afaee53..6f1cfb976c 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -9,7 +9,6 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
-REGS('L', ALL_GENERAL_REGS & ~SOFTMMU_RESERVE_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index c22d1e35ac..d12b824d8c 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -125,17 +125,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_N12   0x400
 #define TCG_CT_CONST_M12   0x800
 
-#define ALL_GENERAL_REGS  MAKE_64BIT_MASK(0, 32)
-/*
- * For softmmu, we need to avoid conflicts with the first 5
- * argument registers to call the helper.  Some of these are
- * also used for the tlb lookup.
- */
-#ifdef CONFIG_SOFTMMU
-#define SOFTMMU_RESERVE_REGS  MAKE_64BIT_MASK(TCG_REG_A0, 5)
-#else
-#define SOFTMMU_RESERVE_REGS  0
-#endif
+#define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 
 #define sextreg  sextract64
 
@@ -1600,10 +1590,10 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_qemu_ld_i32:
 case INDEX_op_qemu_ld_i64:
-return C_O1_I1(r, L);
+return C_O1_I1(r, r);
 case INDEX_op_qemu_st_i32:
 case INDEX_op_qemu_st_i64:
-return C_O0_I2(LZ, L);
+return C_O0_I2(rZ, r);
 
 default:
 g_assert_not_reached();
-- 
2.34.1




[PATCH v5 11/30] tcg/i386: Convert tcg_out_qemu_ld_slow_path

2023-05-06 Thread Richard Henderson
Use tcg_out_ld_helper_args and tcg_out_ld_helper_ret.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 71 +++
 1 file changed, 28 insertions(+), 43 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 8752968af2..17ad3c5963 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1802,13 +1802,37 @@ static void * const qemu_st_helpers[(MO_SIZE | 
MO_BSWAP) + 1] = {
 [MO_BEUQ] = helper_be_stq_mmu,
 };
 
+/*
+ * Because i686 has no register parameters and because x86_64 has xchg
+ * to handle addr/data register overlap, we have placed all input arguments
+ * before we need might need a scratch reg.
+ *
+ * Even then, a scratch is only needed for l->raddr.  Rather than expose
+ * a general-purpose scratch when we don't actually know it's available,
+ * use the ra_gen hook to load into RAX if needed.
+ */
+#if TCG_TARGET_REG_BITS == 64
+static TCGReg ldst_ra_gen(TCGContext *s, const TCGLabelQemuLdst *l, int arg)
+{
+if (arg < 0) {
+arg = TCG_REG_RAX;
+}
+tcg_out_movi(s, TCG_TYPE_PTR, arg, (uintptr_t)l->raddr);
+return arg;
+}
+static const TCGLdstHelperParam ldst_helper_param = {
+.ra_gen = ldst_ra_gen
+};
+#else
+static const TCGLdstHelperParam ldst_helper_param = { };
+#endif
+
 /*
  * Generate code for the slow path for a load at the end of block
  */
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-MemOpIdx oi = l->oi;
-MemOp opc = get_memop(oi);
+MemOp opc = get_memop(l->oi);
 tcg_insn_unit **label_ptr = &l->label_ptr[0];
 
 /* resolve label address */
@@ -1817,49 +1841,10 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 tcg_patch32(label_ptr[1], s->code_ptr - label_ptr[1] - 4);
 }
 
-if (TCG_TARGET_REG_BITS == 32) {
-int ofs = 0;
-
-tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
-ofs += 4;
-
-tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
-ofs += 4;
-
-if (TARGET_LONG_BITS == 64) {
-tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
-ofs += 4;
-}
-
-tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
-ofs += 4;
-
-tcg_out_sti(s, TCG_TYPE_PTR, (uintptr_t)l->raddr, TCG_REG_ESP, ofs);
-} else {
-tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-tcg_out_mov(s, TCG_TYPE_TL, tcg_target_call_iarg_regs[1],
-l->addrlo_reg);
-tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], oi);
-tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
- (uintptr_t)l->raddr);
-}
-
+tcg_out_ld_helper_args(s, l, &ldst_helper_param);
 tcg_out_branch(s, 1, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+tcg_out_ld_helper_ret(s, l, false, &ldst_helper_param);
 
-if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
-TCGMovExtend ext[2] = {
-{ .dst = l->datalo_reg, .dst_type = TCG_TYPE_I32,
-  .src = TCG_REG_EAX, .src_type = TCG_TYPE_I32, .src_ext = MO_UL },
-{ .dst = l->datahi_reg, .dst_type = TCG_TYPE_I32,
-  .src = TCG_REG_EDX, .src_type = TCG_TYPE_I32, .src_ext = MO_UL },
-};
-tcg_out_movext2(s, &ext[0], &ext[1], -1);
-} else {
-tcg_out_movext(s, l->type, l->datalo_reg,
-   TCG_TYPE_REG, opc & MO_SSIZE, TCG_REG_EAX);
-}
-
-/* Jump to the code corresponding to next IR of qemu_st */
 tcg_out_jmp(s, l->raddr);
 return true;
 }
-- 
2.34.1




[PATCH v5 29/30] tcg/s390x: Use ALGFR in constructing softmmu host address

2023-05-06 Thread Richard Henderson
Rather than zero-extend the guest address into a register,
use an add instruction which zero-extends the second input.

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target.c.inc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index dfcf4d9e34..dd13326670 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -149,6 +149,7 @@ typedef enum S390Opcode {
 RRE_ALGR= 0xb90a,
 RRE_ALCR= 0xb998,
 RRE_ALCGR   = 0xb988,
+RRE_ALGFR   = 0xb91a,
 RRE_CGR = 0xb920,
 RRE_CLGR= 0xb921,
 RRE_DLGR= 0xb987,
@@ -1853,10 +1854,11 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 tcg_out_insn(s, RXY, LG, h->index, TCG_REG_R2, TCG_REG_NONE,
  offsetof(CPUTLBEntry, addend));
 
-h->base = addr_reg;
 if (TARGET_LONG_BITS == 32) {
-tcg_out_ext32u(s, TCG_REG_R3, addr_reg);
-h->base = TCG_REG_R3;
+tcg_out_insn(s, RRE, ALGFR, h->index, addr_reg);
+h->base = TCG_REG_NONE;
+} else {
+h->base = addr_reg;
 }
 h->disp = 0;
 #else
-- 
2.34.1




[PATCH v2] MAINTAINERS: Update Akihiko Odaki's email address

2023-05-06 Thread Akihiko Odaki
From: Akihiko Odaki 

I am now employed by Daynix. Although my role as a reviewer of
macOS-related change is not very relevant to the employment, I decided
to use the company email address to avoid confusions from different
addresses.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Philippe Mathieu-Daudé 
---
V1 -> V2: This patch got missed so I rebased it and am resubmitting.

 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 55102f4761..2b89e5dd00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2566,7 +2566,7 @@ Core Audio framework backend
 M: Gerd Hoffmann 
 M: Philippe Mathieu-Daudé 
 R: Christian Schoenebeck 
-R: Akihiko Odaki 
+R: Akihiko Odaki 
 S: Odd Fixes
 F: audio/coreaudio.c
 
@@ -2850,7 +2850,7 @@ F: docs/devel/ui.rst
 Cocoa graphics
 M: Peter Maydell 
 M: Philippe Mathieu-Daudé 
-R: Akihiko Odaki 
+R: Akihiko Odaki 
 S: Odd Fixes
 F: ui/cocoa.m
 
-- 
2.40.1




[PATCH v5 06/30] tcg/mips: Introduce prepare_host_addr

2023-05-06 Thread Richard Henderson
Merge tcg_out_tlb_load, add_qemu_ldst_label, tcg_out_test_alignment,
and some code that lived in both tcg_out_qemu_ld and tcg_out_qemu_st
into one function that returns HostAddress and TCGLabelQemuLdst structures.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 404 --
 1 file changed, 172 insertions(+), 232 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index ef8350e9cd..94708e6ea7 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -1181,120 +1181,6 @@ static int tcg_out_call_iarg_reg2(TCGContext *s, int i, 
TCGReg al, TCGReg ah)
 return i;
 }
 
-/* We expect to use a 16-bit negative offset from ENV.  */
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -32768);
-
-/*
- * Perform the tlb comparison operation.
- * The complete host address is placed in BASE.
- * Clobbers TMP0, TMP1, TMP2, TMP3.
- */
-static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
- TCGReg addrh, MemOpIdx oi,
- tcg_insn_unit *label_ptr[2], bool is_load)
-{
-MemOp opc = get_memop(oi);
-unsigned a_bits = get_alignment_bits(opc);
-unsigned s_bits = opc & MO_SIZE;
-unsigned a_mask = (1 << a_bits) - 1;
-unsigned s_mask = (1 << s_bits) - 1;
-int mem_index = get_mmuidx(oi);
-int fast_off = TLB_MASK_TABLE_OFS(mem_index);
-int mask_off = fast_off + offsetof(CPUTLBDescFast, mask);
-int table_off = fast_off + offsetof(CPUTLBDescFast, table);
-int add_off = offsetof(CPUTLBEntry, addend);
-int cmp_off = (is_load ? offsetof(CPUTLBEntry, addr_read)
-   : offsetof(CPUTLBEntry, addr_write));
-target_ulong tlb_mask;
-
-/* Load tlb_mask[mmu_idx] and tlb_table[mmu_idx].  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP0, TCG_AREG0, mask_off);
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP1, TCG_AREG0, table_off);
-
-/* Extract the TLB index from the address into TMP3.  */
-tcg_out_opc_sa(s, ALIAS_TSRL, TCG_TMP3, addrl,
-   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-tcg_out_opc_reg(s, OPC_AND, TCG_TMP3, TCG_TMP3, TCG_TMP0);
-
-/* Add the tlb_table pointer, creating the CPUTLBEntry address in TMP3.  */
-tcg_out_opc_reg(s, ALIAS_PADD, TCG_TMP3, TCG_TMP3, TCG_TMP1);
-
-/* Load the (low-half) tlb comparator.  */
-if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-tcg_out_ldst(s, OPC_LW, TCG_TMP0, TCG_TMP3, cmp_off + LO_OFF);
-} else {
-tcg_out_ldst(s, (TARGET_LONG_BITS == 64 ? OPC_LD
- : TCG_TARGET_REG_BITS == 64 ? OPC_LWU : OPC_LW),
- TCG_TMP0, TCG_TMP3, cmp_off);
-}
-
-/* Zero extend a 32-bit guest address for a 64-bit host. */
-if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
-tcg_out_ext32u(s, base, addrl);
-addrl = base;
-}
-
-/*
- * Mask the page bits, keeping the alignment bits to compare against.
- * For unaligned accesses, compare against the end of the access to
- * verify that it does not cross a page boundary.
- */
-tlb_mask = (target_ulong)TARGET_PAGE_MASK | a_mask;
-tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1, tlb_mask);
-if (a_mask >= s_mask) {
-tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, addrl);
-} else {
-tcg_out_opc_imm(s, ALIAS_PADDI, TCG_TMP2, addrl, s_mask - a_mask);
-tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, TCG_TMP2);
-}
-
-if (TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
-/* Load the tlb addend for the fast path.  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP2, TCG_TMP3, add_off);
-}
-
-label_ptr[0] = s->code_ptr;
-tcg_out_opc_br(s, OPC_BNE, TCG_TMP1, TCG_TMP0);
-
-/* Load and test the high half tlb comparator.  */
-if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-/* delay slot */
-tcg_out_ldst(s, OPC_LW, TCG_TMP0, TCG_TMP3, cmp_off + HI_OFF);
-
-/* Load the tlb addend for the fast path.  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP2, TCG_TMP3, add_off);
-
-label_ptr[1] = s->code_ptr;
-tcg_out_opc_br(s, OPC_BNE, addrh, TCG_TMP0);
-}
-
-/* delay slot */
-tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_TMP2, addrl);
-}
-
-static void add_qemu_ldst_label(TCGContext *s, int is_ld, MemOpIdx oi,
-TCGType ext,
-TCGReg datalo, TCGReg datahi,
-TCGReg addrlo, TCGReg addrhi,
-void *raddr, tcg_insn_unit *label_ptr[2])
-{
-TCGLabelQemuLdst *label = new_ldst_label(s);
-
-label->is_ld = is_ld;
-label->oi = oi;
-label->type = ext;
-label->datalo_reg = datalo;
-label->datahi_reg = datahi;
-label->addrlo_reg = addrlo;
-label->addrhi_reg = addrhi;
-label->raddr = tcg_splitwx_to_rx(raddr);
-label->label_ptr[0] = label_ptr[0];
-if (T

[PATCH v5 25/30] tcg/ppc: Adjust constraints on qemu_ld/st

2023-05-06 Thread Richard Henderson
The softmmu tlb uses TCG_REG_{TMP1,TMP2,R0}, not any of the normally
available registers.  Now that we handle overlap betwen inputs and
helper arguments, we can allow any allocatable reg.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target-con-set.h | 11 ---
 tcg/ppc/tcg-target-con-str.h |  2 --
 tcg/ppc/tcg-target.c.inc | 32 ++--
 3 files changed, 14 insertions(+), 31 deletions(-)

diff --git a/tcg/ppc/tcg-target-con-set.h b/tcg/ppc/tcg-target-con-set.h
index a1a345883d..f206b29205 100644
--- a/tcg/ppc/tcg-target-con-set.h
+++ b/tcg/ppc/tcg-target-con-set.h
@@ -12,18 +12,15 @@
 C_O0_I1(r)
 C_O0_I2(r, r)
 C_O0_I2(r, ri)
-C_O0_I2(S, S)
 C_O0_I2(v, r)
-C_O0_I3(S, S, S)
+C_O0_I3(r, r, r)
 C_O0_I4(r, r, ri, ri)
-C_O0_I4(S, S, S, S)
-C_O1_I1(r, L)
+C_O0_I4(r, r, r, r)
 C_O1_I1(r, r)
 C_O1_I1(v, r)
 C_O1_I1(v, v)
 C_O1_I1(v, vr)
 C_O1_I2(r, 0, rZ)
-C_O1_I2(r, L, L)
 C_O1_I2(r, rI, ri)
 C_O1_I2(r, rI, rT)
 C_O1_I2(r, r, r)
@@ -36,7 +33,7 @@ C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, rZ, rZ)
 C_O1_I4(r, r, r, ri, ri)
-C_O2_I1(L, L, L)
-C_O2_I2(L, L, L, L)
+C_O2_I1(r, r, r)
+C_O2_I2(r, r, r, r)
 C_O2_I4(r, r, rI, rZM, r, r)
 C_O2_I4(r, r, r, r, rI, rZM)
diff --git a/tcg/ppc/tcg-target-con-str.h b/tcg/ppc/tcg-target-con-str.h
index 298ca20d5b..f3bf030bc3 100644
--- a/tcg/ppc/tcg-target-con-str.h
+++ b/tcg/ppc/tcg-target-con-str.h
@@ -14,8 +14,6 @@ REGS('A', 1u << TCG_REG_R3)
 REGS('B', 1u << TCG_REG_R4)
 REGS('C', 1u << TCG_REG_R5)
 REGS('D', 1u << TCG_REG_R6)
-REGS('L', ALL_QLOAD_REGS)
-REGS('S', ALL_QSTORE_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 6850ecbc80..5a4ec0470a 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -93,18 +93,6 @@
 #define ALL_GENERAL_REGS  0xu
 #define ALL_VECTOR_REGS   0xull
 
-#ifdef CONFIG_SOFTMMU
-#define ALL_QLOAD_REGS \
-(ALL_GENERAL_REGS & \
- ~((1 << TCG_REG_R3) | (1 << TCG_REG_R4) | (1 << TCG_REG_R5)))
-#define ALL_QSTORE_REGS \
-(ALL_GENERAL_REGS & ~((1 << TCG_REG_R3) | (1 << TCG_REG_R4) | \
-  (1 << TCG_REG_R5) | (1 << TCG_REG_R6)))
-#else
-#define ALL_QLOAD_REGS  (ALL_GENERAL_REGS & ~(1 << TCG_REG_R3))
-#define ALL_QSTORE_REGS ALL_QLOAD_REGS
-#endif
-
 TCGPowerISA have_isa;
 static bool have_isel;
 bool have_altivec;
@@ -3752,23 +3740,23 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_qemu_ld_i32:
 return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-? C_O1_I1(r, L)
-: C_O1_I2(r, L, L));
+? C_O1_I1(r, r)
+: C_O1_I2(r, r, r));
 
 case INDEX_op_qemu_st_i32:
 return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-? C_O0_I2(S, S)
-: C_O0_I3(S, S, S));
+? C_O0_I2(r, r)
+: C_O0_I3(r, r, r));
 
 case INDEX_op_qemu_ld_i64:
-return (TCG_TARGET_REG_BITS == 64 ? C_O1_I1(r, L)
-: TARGET_LONG_BITS == 32 ? C_O2_I1(L, L, L)
-: C_O2_I2(L, L, L, L));
+return (TCG_TARGET_REG_BITS == 64 ? C_O1_I1(r, r)
+: TARGET_LONG_BITS == 32 ? C_O2_I1(r, r, r)
+: C_O2_I2(r, r, r, r));
 
 case INDEX_op_qemu_st_i64:
-return (TCG_TARGET_REG_BITS == 64 ? C_O0_I2(S, S)
-: TARGET_LONG_BITS == 32 ? C_O0_I3(S, S, S)
-: C_O0_I4(S, S, S, S));
+return (TCG_TARGET_REG_BITS == 64 ? C_O0_I2(r, r)
+: TARGET_LONG_BITS == 32 ? C_O0_I3(r, r, r)
+: C_O0_I4(r, r, r, r));
 
 case INDEX_op_add_vec:
 case INDEX_op_sub_vec:
-- 
2.34.1




[PATCH v5 27/30] tcg/ppc: Remove unused constraint J

2023-05-06 Thread Richard Henderson
Never used since its introduction.

Fixes: 3d582c6179c ("tcg-ppc64: Rearrange integer constant constraints")
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target-con-str.h | 1 -
 tcg/ppc/tcg-target.c.inc | 3 ---
 2 files changed, 4 deletions(-)

diff --git a/tcg/ppc/tcg-target-con-str.h b/tcg/ppc/tcg-target-con-str.h
index 9dcbc3df50..094613cbcb 100644
--- a/tcg/ppc/tcg-target-con-str.h
+++ b/tcg/ppc/tcg-target-con-str.h
@@ -16,7 +16,6 @@ REGS('v', ALL_VECTOR_REGS)
  * CONST(letter, TCG_CT_CONST_* bit set)
  */
 CONST('I', TCG_CT_CONST_S16)
-CONST('J', TCG_CT_CONST_U16)
 CONST('M', TCG_CT_CONST_MONE)
 CONST('T', TCG_CT_CONST_S32)
 CONST('U', TCG_CT_CONST_U32)
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 5a4ec0470a..0a14c3e997 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -83,7 +83,6 @@
 #define SZR  (TCG_TARGET_REG_BITS / 8)
 
 #define TCG_CT_CONST_S16  0x100
-#define TCG_CT_CONST_U16  0x200
 #define TCG_CT_CONST_S32  0x400
 #define TCG_CT_CONST_U32  0x800
 #define TCG_CT_CONST_ZERO 0x1000
@@ -270,8 +269,6 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct)
 
 if ((ct & TCG_CT_CONST_S16) && val == (int16_t)val) {
 return 1;
-} else if ((ct & TCG_CT_CONST_U16) && val == (uint16_t)val) {
-return 1;
 } else if ((ct & TCG_CT_CONST_S32) && val == (int32_t)val) {
 return 1;
 } else if ((ct & TCG_CT_CONST_U32) && val == (uint32_t)val) {
-- 
2.34.1




Re: [PATCH v2] target/ppc: Fix fallback to MFSS for MFFS* instructions on pre 3.0 ISAs

2023-05-06 Thread Richard Henderson

On 5/6/23 07:52, Richard Purdie wrote:

The following commits changed the code such that the fallback to MFSS for 
MFFSCRN,
MFFSCRNI, MFFSCE and MFFSL on pre 3.0 ISAs was removed and became an illegal 
instruction:

   bf8adfd88b547680aa857c46098f3a1e94373160 - target/ppc: Move mffscrn[i] to 
decodetree
   394c2e2fda70da722f20fb60412d6c0ca4bfaa03 - target/ppc: Move mffsce to 
decodetree
   3e5bce70efe6bd1f684efbb21fd2a316cbf0657e - target/ppc: Move mffsl to 
decodetree

The hardware will handle them as a MFFS instruction as the code did previously.
This means applications that were segfaulting under qemu when encountering these
instructions which is used in glibc libm functions for example.

The fallback for MFFSCDRN and MFFSCDRNI added in a later patch was also missing.

This patch restores the fallback to MFSS for these instructions on pre 3.0s ISAs
as the hardware decoder would, fixing the segfaulting libm code. It and also 
ensures
the MFSS instruction is used for currently reserved bits to handle other 
potential
ISA additions more correctly.

Signed-off-by: Richard Purdie
---
  target/ppc/insn32.decode   | 19 ---
  target/ppc/translate/fp-impl.c.inc | 30 --
  2 files changed, 36 insertions(+), 13 deletions(-)

v2 - switch to use decodetree pattern groups per feedback


Reviewed-by: Richard Henderson 

r~



[PATCH v5 08/30] tcg/riscv: Introduce prepare_host_addr

2023-05-06 Thread Richard Henderson
Merge tcg_out_tlb_load, add_qemu_ldst_label, tcg_out_test_alignment,
and some code that lived in both tcg_out_qemu_ld and tcg_out_qemu_st
into one function that returns TCGReg and TCGLabelQemuLdst.

Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target.c.inc | 253 +
 1 file changed, 114 insertions(+), 139 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index a4cf60ca75..2b2d313fe2 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -899,10 +899,6 @@ static void * const qemu_st_helpers[MO_SIZE + 1] = {
 #endif
 };
 
-/* We expect to use a 12-bit negative offset from ENV.  */
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
-QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -(1 << 11));
-
 static void tcg_out_goto(TCGContext *s, const tcg_insn_unit *target)
 {
 tcg_out_opc_jump(s, OPC_JAL, TCG_REG_ZERO, 0);
@@ -910,76 +906,6 @@ static void tcg_out_goto(TCGContext *s, const 
tcg_insn_unit *target)
 tcg_debug_assert(ok);
 }
 
-static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addr, MemOpIdx oi,
-   tcg_insn_unit **label_ptr, bool is_load)
-{
-MemOp opc = get_memop(oi);
-unsigned s_bits = opc & MO_SIZE;
-unsigned a_bits = get_alignment_bits(opc);
-tcg_target_long compare_mask;
-int mem_index = get_mmuidx(oi);
-int fast_ofs = TLB_MASK_TABLE_OFS(mem_index);
-int mask_ofs = fast_ofs + offsetof(CPUTLBDescFast, mask);
-int table_ofs = fast_ofs + offsetof(CPUTLBDescFast, table);
-TCGReg mask_base = TCG_AREG0, table_base = TCG_AREG0;
-
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP0, mask_base, mask_ofs);
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, table_base, table_ofs);
-
-tcg_out_opc_imm(s, OPC_SRLI, TCG_REG_TMP2, addr,
-TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-tcg_out_opc_reg(s, OPC_AND, TCG_REG_TMP2, TCG_REG_TMP2, TCG_REG_TMP0);
-tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP2, TCG_REG_TMP2, TCG_REG_TMP1);
-
-/* Load the tlb comparator and the addend.  */
-tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP0, TCG_REG_TMP2,
-   is_load ? offsetof(CPUTLBEntry, addr_read)
-   : offsetof(CPUTLBEntry, addr_write));
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP2, TCG_REG_TMP2,
-   offsetof(CPUTLBEntry, addend));
-
-/* We don't support unaligned accesses. */
-if (a_bits < s_bits) {
-a_bits = s_bits;
-}
-/* Clear the non-page, non-alignment bits from the address.  */
-compare_mask = (tcg_target_long)TARGET_PAGE_MASK | ((1 << a_bits) - 1);
-if (compare_mask == sextreg(compare_mask, 0, 12)) {
-tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_TMP1, addr, compare_mask);
-} else {
-tcg_out_movi(s, TCG_TYPE_TL, TCG_REG_TMP1, compare_mask);
-tcg_out_opc_reg(s, OPC_AND, TCG_REG_TMP1, TCG_REG_TMP1, addr);
-}
-
-/* Compare masked address with the TLB entry. */
-label_ptr[0] = s->code_ptr;
-tcg_out_opc_branch(s, OPC_BNE, TCG_REG_TMP0, TCG_REG_TMP1, 0);
-
-/* TLB Hit - translate address using addend.  */
-if (TARGET_LONG_BITS == 32) {
-tcg_out_ext32u(s, TCG_REG_TMP0, addr);
-addr = TCG_REG_TMP0;
-}
-tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP0, TCG_REG_TMP2, addr);
-return TCG_REG_TMP0;
-}
-
-static void add_qemu_ldst_label(TCGContext *s, int is_ld, MemOpIdx oi,
-TCGType data_type, TCGReg data_reg,
-TCGReg addr_reg, void *raddr,
-tcg_insn_unit **label_ptr)
-{
-TCGLabelQemuLdst *label = new_ldst_label(s);
-
-label->is_ld = is_ld;
-label->oi = oi;
-label->type = data_type;
-label->datalo_reg = data_reg;
-label->addrlo_reg = addr_reg;
-label->raddr = tcg_splitwx_to_rx(raddr);
-label->label_ptr[0] = label_ptr[0];
-}
-
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
 MemOpIdx oi = l->oi;
@@ -1037,26 +963,6 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 return true;
 }
 #else
-
-static void tcg_out_test_alignment(TCGContext *s, bool is_ld, TCGReg addr_reg,
-   unsigned a_bits)
-{
-unsigned a_mask = (1 << a_bits) - 1;
-TCGLabelQemuLdst *l = new_ldst_label(s);
-
-l->is_ld = is_ld;
-l->addrlo_reg = addr_reg;
-
-/* We are expecting a_bits to max out at 7, so we can always use andi. */
-tcg_debug_assert(a_bits < 12);
-tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_TMP1, addr_reg, a_mask);
-
-l->label_ptr[0] = s->code_ptr;
-tcg_out_opc_branch(s, OPC_BNE, TCG_REG_TMP1, TCG_REG_ZERO, 0);
-
-l->raddr = tcg_splitwx_to_rx(s->code_ptr);
-}
-
 static bool tcg_out_fail_alignment(TCGContext *s, TCGLabelQemuLdst *l)
 {
 /* resolve label address */
@@ -1083,9 +989,108 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 {
 return tcg_out_fail_alignment(s,

Re: [PATCH v3 04/10] scripts/qapi: document the tool that generated the file

2023-05-06 Thread Markus Armbruster
Alex Bennée  writes:

> This makes it a little easier for developers to find where things
> where being generated.
>
> Reviewed-by: Richard Henderson 
> Signed-off-by: Alex Bennée 
> Message-Id: <20230503091756.1453057-5-alex.ben...@linaro.org>
> ---
>  scripts/qapi/gen.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/qapi/gen.py b/scripts/qapi/gen.py
> index 8f8f784f4a..e724507e1a 100644
> --- a/scripts/qapi/gen.py
> +++ b/scripts/qapi/gen.py
> @@ -162,7 +162,7 @@ def __init__(self, fname: str, blurb: str, pydoc: str):
>  
>  def _top(self) -> str:
>  return mcgen('''
> -/* AUTOMATICALLY GENERATED, DO NOT MODIFY */
> +/* AUTOMATICALLY GENERATED by QAPIGenC, DO NOT MODIFY */
>  
>  /*
>  %(blurb)s
> @@ -195,7 +195,7 @@ def _bottom(self) -> str:
>  
>  class QAPIGenTrace(QAPIGen):
>  def _top(self) -> str:
> -return super()._top() + '# AUTOMATICALLY GENERATED, DO NOT 
> MODIFY\n\n'
> +return super()._top() + '# AUTOMATICALLY GENERATED by QAPIGenTrace, 
> DO NOT MODIFY\n\n'
>  
>  
>  @contextmanager

Nitpicking...  would "GENERATED BY {os.path.basename(sys.argv[0])}" be
more useful?  The people who know what QAPIGenC and QAPIGenTrace mean
are probably the ones that need this warning the least :)




Re: [PULL v2 00/45] loongarch-to-apply queue

2023-05-06 Thread Richard Henderson

On 5/6/23 07:34, Song Gao wrote:

The following changes since commit eb5c3932a383ba1ef3a911232c644f2e053ef66c:

   Merge tag 'pw-pull-request' ofhttps://gitlab.com/marcandre.lureau/qemu  into 
staging (2023-05-05 19:18:05 +0100)

are available in the Git repository at:

   https://gitlab.com/gaosong/qemu.git  tags/pull-loongarch-20230506

for you to fetch changes up to 725d7e763a802321e1bb303348afc551d564d31e:

   hw/intc: don't use target_ulong for LoongArch ipi (2023-05-06 11:19:50 +0800)


Add LoongArch LSX instructions.

v2: Fixes build error.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




Re: [PATCH v10 1/8] memory: prevent dma-reentracy issues

2023-05-06 Thread Song Gao

 Hi Alexander

在 2023/4/28 下午5:14, Thomas Huth 写道:

On 28/04/2023 11.11, Alexander Bulekov wrote:

On 230428 1015, Thomas Huth wrote:

On 28/04/2023 10.12, Daniel P. Berrangé wrote:

On Thu, Apr 27, 2023 at 05:10:06PM -0400, Alexander Bulekov wrote:
Add a flag to the DeviceState, when a device is engaged in 
PIO/MMIO/DMA.

This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA.  The purpose of 
this

flag is to prevent two types of DMA-based reentrancy issues:

1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case

These issues have led to problems such as stack-exhaustion and
use-after-frees.

Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/cafeaca_23vc7he3iam-jva6w38lk4hjowae5kcknhprd5fp...@mail.gmail.com 



Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
Resolves: CVE-2023-0330

Signed-off-by: Alexander Bulekov 
Reviewed-by: Thomas Huth 
---
   include/exec/memory.h  |  5 +
   include/hw/qdev-core.h |  7 +++
   softmmu/memory.c   | 16 
   3 files changed, 28 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15ade918ba..e45ce6061f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -767,6 +767,8 @@ struct MemoryRegion {
   bool is_iommu;
   RAMBlock *ram_block;
   Object *owner;
+    /* owner as TYPE_DEVICE. Used for re-entrancy checks in MR 
access hotpath */

+    DeviceState *dev;
   const MemoryRegionOps *ops;
   void *opaque;
@@ -791,6 +793,9 @@ struct MemoryRegion {
   unsigned ioeventfd_nb;
   MemoryRegionIoeventfd *ioeventfds;
   RamDiscardManager *rdm; /* Only for RAM */
+
+    /* For devices designed to perform re-entrant IO into their 
own IO MRs */

+    bool disable_reentrancy_guard;
   };
   struct IOMMUMemoryRegion {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index bd50ad5ee1..7623703943 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -162,6 +162,10 @@ struct NamedClockList {
   QLIST_ENTRY(NamedClockList) node;
   };
+typedef struct {
+    bool engaged_in_io;
+} MemReentrancyGuard;
+
   /**
    * DeviceState:
    * @realized: Indicates whether the device has been fully 
constructed.

@@ -194,6 +198,9 @@ struct DeviceState {
   int alias_required_for_version;
   ResettableState reset;
   GSList *unplug_blockers;
+
+    /* Is the device currently in mmio/pio/dma? Used to prevent 
re-entrancy */

+    MemReentrancyGuard mem_reentrancy_guard;
   };
   struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index b1a6cae6f5..fe23f0e5ce 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -542,6 +542,18 @@ static MemTxResult 
access_with_adjusted_size(hwaddr addr,

   access_size_max = 4;
   }
+    /* Do not allow more than one simultaneous access to a 
device's IO Regions */

+    if (mr->dev && !mr->disable_reentrancy_guard &&
+    !mr->ram_device && !mr->ram && !mr->rom_device && 
!mr->readonly) {

+    if (mr->dev->mem_reentrancy_guard.engaged_in_io) {
+    warn_report("Blocked re-entrant IO on "
+    "MemoryRegion: %s at addr: 0x%" HWADDR_PRIX,
+    memory_region_name(mr), addr);
+    return MEMTX_ACCESS_ERROR;


If we issue this warn_report on every invalid memory access, is this
going to become a denial of service by flooding logs, or is the
return MEMTX_ACCESS_ERROR, sufficient to ensure this is only printed
*once* in the lifetime of the QEMU process ?


Maybe it's better to use warn_report_once() here instead?


Sounds good - should I respin the series to change this?


Not necessary, I've got v10 already queued, I'll fix it up there

 Thomas

This patch causes the loongarch virtual machine to fail to start the 
slave cpu.


    ./build/qemu-system-loongarch64 -machine virt -m 8G -cpu la464 \
             -smp 4 -bios QEMU_EFI.fd -kernel vmlinuz.efi -initrd 
ramdisk   \
   -serial stdio   -monitor 
telnet:localhost:4495,server,nowait  \
   -append "root=/dev/ram rdinit=/sbin/init 
console=ttyS0,115200"   --nographic




qemu-system-loongarch64: warning: Blocked re-entrant IO on MemoryRegion: 
loongarch_ipi_iocsr at addr: 0x24



[    0.059284] smp: Bringing up secondary CPUs ...
[    0.062540] Booting CPU#1...
[    5.204340] CPU1: failed to start
[    5.211435] Booting CPU#2...
[   10.465762] CPU2: failed to start
[   10.467757] Booting CPU#3...
[   15.805430] CPU3: failed to start
[   15.805980] smp: Brought up 1 node, 1 CPU
[   15.818832] devtmpfs:

[PATCH] loongarch: mark loongarch_ipi_iocsr re-entrnacy safe

2023-05-06 Thread Alexander Bulekov
loongarch_ipi_iocsr MRs rely on re-entrant IO through the ipi_send
function. As such, mark these MRs re-entrancy-safe.

Fixes: a2e1753b80 ("memory: prevent dma-reentracy issues")
Signed-off-by: Alexander Bulekov 
---
 hw/intc/loongarch_ipi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/intc/loongarch_ipi.c b/hw/intc/loongarch_ipi.c
index bdba0f8107..9de7c01e11 100644
--- a/hw/intc/loongarch_ipi.c
+++ b/hw/intc/loongarch_ipi.c
@@ -215,6 +215,10 @@ static void loongarch_ipi_init(Object *obj)
 for (cpu = 0; cpu < MAX_IPI_CORE_NUM; cpu++) {
 memory_region_init_io(&s->ipi_iocsr_mem[cpu], obj, &loongarch_ipi_ops,
 &lams->ipi_core[cpu], "loongarch_ipi_iocsr", 0x48);
+
+/* loongarch_ipi_iocsr performs re-entrant IO through ipi_send */
+s->ipi_iocsr_mem[cpu].disable_reentrancy_guard = true;
+
 sysbus_init_mmio(sbd, &s->ipi_iocsr_mem[cpu]);
 
 memory_region_init_io(&s->ipi64_iocsr_mem[cpu], obj, 
&loongarch_ipi64_ops,
-- 
2.39.0




[PATCH v2 0/2] Send all the SVQ control commands in parallel

2023-05-06 Thread Hawkins Jiawei
This patchset allows QEMU to poll and check the device used buffer
after sending all SVQ control commands, instead of polling and checking
immediately after sending each SVQ control command, so that QEMU can
send all the SVQ control commands in parallel, which have better
performance improvement.

I use vdpa_sim_net to simulate vdpa device, refactor
vhost_vdpa_net_load() to call vhost_vdpa_net_load_mac() 30 times,
to build a test environment for sending multiple SVQ control commands.
The monotonic time to finish vhost_vdpa_net_load() is as follows:

QEMUmicroseconds
--
not patched  85.092
--
patched  79.222

So this is a save of (85.092 - 79.222)/30 = 0.2 ms per command.

This patchset resolves the GitLab issue at
https://gitlab.com/qemu-project/qemu/-/issues/1578.

v2:
  - recover accidentally deleted rows
  - remove extra newline
  - refactor `need_poll_len` to `cmds_in_flight`
  - return -EINVAL when vhost_svq_poll() return 0 or check
on buffers written by device fails
  - change the type of `in_cursor`, and refactor the
code for updating cursor
  - return directly when vhost_vdpa_net_load_{mac,mq}()
returns a failure in vhost_vdpa_net_load()

v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02668.html

Hawkins Jiawei (2):
  vdpa: rename vhost_vdpa_net_cvq_add()
  vdpa: send CVQ state load commands in parallel

 net/vhost-vdpa.c | 165 +--
 1 file changed, 130 insertions(+), 35 deletions(-)

-- 
2.25.1




[PATCH v2 2/2] vdpa: send CVQ state load commands in parallel

2023-05-06 Thread Hawkins Jiawei
This patch introduces the vhost_vdpa_net_cvq_add() and
refactors the vhost_vdpa_net_load*(), so that QEMU can
send CVQ state load commands in parallel.

To be more specific, this patch introduces vhost_vdpa_net_cvq_add()
to add SVQ control commands to SVQ and kick the device,
but does not poll the device used buffers. QEMU will not
poll and check the device used buffers in vhost_vdpa_net_load()
until all CVQ state load commands have been sent to the device.

What's more, in order to avoid buffer overwriting caused by
using `svq->cvq_cmd_out_buffer` and `svq->status` as the
buffer for all CVQ state load commands when sending
CVQ state load commands in parallel, this patch introduces
`out_cursor` and `in_cursor` in vhost_vdpa_net_load(),
pointing to the available buffer for in descriptor and
out descriptor, so that different CVQ state load commands can
use their unique buffer.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578
Signed-off-by: Hawkins Jiawei 
---
 net/vhost-vdpa.c | 152 +--
 1 file changed, 120 insertions(+), 32 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 10804c7200..14e31ca5c5 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -590,6 +590,44 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
 vhost_vdpa_net_client_stop(nc);
 }
 
+/**
+ * vhost_vdpa_net_cvq_add() adds SVQ control commands to SVQ,
+ * kicks the device but does not poll the device used buffers.
+ *
+ * Return the number of elements added to SVQ if success.
+ */
+static int vhost_vdpa_net_cvq_add(VhostVDPAState *s,
+void **out_cursor, size_t out_len,
+virtio_net_ctrl_ack **in_cursor, size_t in_len)
+{
+/* Buffers for the device */
+const struct iovec out = {
+.iov_base = *out_cursor,
+.iov_len = out_len,
+};
+const struct iovec in = {
+.iov_base = *in_cursor,
+.iov_len = sizeof(virtio_net_ctrl_ack),
+};
+VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
+int r;
+
+r = vhost_svq_add(svq, &out, 1, &in, 1, NULL);
+if (unlikely(r != 0)) {
+if (unlikely(r == -ENOSPC)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
+  __func__);
+}
+return r;
+}
+
+/* Update the cursor */
+*out_cursor += out_len;
+*in_cursor += 1;
+
+return 1;
+}
+
 /**
  * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ,
  * kicks the device and polls the device used buffers.
@@ -628,69 +666,82 @@ static ssize_t 
vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s,
 return vhost_svq_poll(svq);
 }
 
-static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class,
-   uint8_t cmd, const void *data,
-   size_t data_size)
+
+/**
+ * vhost_vdpa_net_load_cmd() restores the NIC state through SVQ.
+ *
+ * Return the number of elements added to SVQ if success.
+ */
+static int vhost_vdpa_net_load_cmd(VhostVDPAState *s,
+void **out_cursor, uint8_t class, uint8_t cmd,
+const void *data, size_t data_size,
+virtio_net_ctrl_ack **in_cursor)
 {
 const struct virtio_net_ctrl_hdr ctrl = {
 .class = class,
 .cmd = cmd,
 };
 
-assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl));
+assert(sizeof(ctrl) < vhost_vdpa_net_cvq_cmd_page_len() -
+  (*out_cursor - s->cvq_cmd_out_buffer));
+assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl) -
+   (*out_cursor - s->cvq_cmd_out_buffer));
 
-memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl));
-memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size);
+memcpy(*out_cursor, &ctrl, sizeof(ctrl));
+memcpy(*out_cursor + sizeof(ctrl), data, data_size);
 
-return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size,
-  sizeof(virtio_net_ctrl_ack));
+return vhost_vdpa_net_cvq_add(s, out_cursor, sizeof(ctrl) + data_size,
+  in_cursor, sizeof(virtio_net_ctrl_ack));
 }
 
-static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n)
+/**
+ * vhost_vdpa_net_load_mac() restores the NIC mac through SVQ.
+ *
+ * Return the number of elements added to SVQ if success.
+ */
+static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n,
+void **out_cursor, virtio_net_ctrl_ack **in_cursor)
 {
 uint64_t features = n->parent_obj.guest_features;
 if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
-ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC,
-  VIRTIO_NET_CTRL_MAC_ADDR_SET,
-

[PATCH v2 1/2] vdpa: rename vhost_vdpa_net_cvq_add()

2023-05-06 Thread Hawkins Jiawei
We want to introduce a new version of vhost_vdpa_net_cvq_add() that
does not poll immediately after forwarding custom buffers
to the device, so that QEMU can send all the SVQ control commands
in parallel instead of serialized.

Signed-off-by: Hawkins Jiawei 
---
 net/vhost-vdpa.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 99904a0da7..10804c7200 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -590,8 +590,14 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
 vhost_vdpa_net_client_stop(nc);
 }
 
-static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
-  size_t in_len)
+/**
+ * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ,
+ * kicks the device and polls the device used buffers.
+ *
+ * Return the length written by the device.
+ */
+static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s,
+size_t out_len, size_t in_len)
 {
 /* Buffers for the device */
 const struct iovec out = {
@@ -636,7 +642,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, 
uint8_t class,
 memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl));
 memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size);
 
-return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size,
+return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size,
   sizeof(virtio_net_ctrl_ack));
 }
 
@@ -753,7 +759,8 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 dev_written = sizeof(status);
 *s->status = VIRTIO_NET_OK;
 } else {
-dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status));
+dev_written = vhost_vdpa_net_cvq_add_and_wait(s, out.iov_len,
+  sizeof(status));
 if (unlikely(dev_written < 0)) {
 goto out;
 }
-- 
2.25.1




[PATCH RESEND] vhost: fix possible wrap in SVQ descriptor ring

2023-05-06 Thread Hawkins Jiawei
QEMU invokes vhost_svq_add() when adding a guest's element into SVQ.
In vhost_svq_add(), it uses vhost_svq_available_slots() to check
whether QEMU can add the element into the SVQ. If there is
enough space, then QEMU combines some out descriptors and
some in descriptors into one descriptor chain, and add it into
svq->vring.desc by vhost_svq_vring_write_descs().

Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx`
in vhost_svq_available_slots() return the number of occupied elements,
or the number of descriptor chains, instead of the number of occupied
descriptors, which may cause wrapping in SVQ descriptor ring.

Here is an example. In vhost_handle_guest_kick(), QEMU forwards
as many available buffers to device by virtqueue_pop() and
vhost_svq_add_element(). virtqueue_pop() return a guest's element,
and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to
add this element into SVQ. If QEMU invokes virtqueue_pop() and
vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots()
thinks QEMU just ran out of slots and everything should work fine.
But in fact, virtqueue_pop() return `svq-vring.num` elements or
descriptor chains, more than `svq->vring.num` descriptors, due to
guest memory fragmentation, and this cause wrapping in SVQ descriptor ring.

Therefore, this patch adds `num_free` field in VhostShadowVirtqueue
structure, updates this field in vhost_svq_add() and
vhost_svq_get_buf(), to record the number of free descriptors.
Then we can avoid wrap in SVQ descriptor ring by refactoring
vhost_svq_available_slots().

Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
Signed-off-by: Hawkins Jiawei 
---
 hw/virtio/vhost-shadow-virtqueue.c | 9 -
 hw/virtio/vhost-shadow-virtqueue.h | 3 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 8361e70d1b..e1c6952b10 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp)
  */
 static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
 {
-return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
+return svq->num_free;
 }
 
 /**
@@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct 
iovec *out_sg,
 return -EINVAL;
 }
 
+/* Update the size of SVQ vring free descriptors */
+svq->num_free -= ndescs;
+
 svq->desc_state[qemu_head].elem = elem;
 svq->desc_state[qemu_head].ndescs = ndescs;
 vhost_svq_kick(svq);
@@ -450,6 +453,9 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 svq->desc_next[last_used_chain] = svq->free_head;
 svq->free_head = used_elem.id;
 
+/* Update the size of SVQ vring free descriptors */
+svq->num_free += num;
+
 *len = used_elem.len;
 return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
 }
@@ -659,6 +665,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
VirtIODevice *vdev,
 svq->iova_tree = iova_tree;
 
 svq->vring.num = virtio_queue_get_num(vdev, virtio_get_queue_index(vq));
+svq->num_free = svq->vring.num;
 driver_size = vhost_svq_driver_area_size(svq);
 device_size = vhost_svq_device_area_size(svq);
 svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), driver_size);
diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 926a4897b1..6efe051a70 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue {
 
 /* Next head to consume from the device */
 uint16_t last_used_idx;
+
+/* Size of SVQ vring free descriptors */
+uint16_t num_free;
 } VhostShadowVirtqueue;
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
-- 
2.25.1




[PATCH 05/12] audio/pw: needless check for NULL

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

g_clear_pointer() already checks for NULL.

Signed-off-by: Marc-André Lureau 
---
 audio/pwaudio.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index 51cfc0b052..6ca4ef4f62 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -834,12 +834,8 @@ fail:
 if (pw->thread_loop) {
 pw_thread_loop_stop(pw->thread_loop);
 }
-if (pw->context) {
-g_clear_pointer(&pw->context, pw_context_destroy);
-}
-if (pw->thread_loop) {
-g_clear_pointer(&pw->thread_loop, pw_thread_loop_destroy);
-}
+g_clear_pointer(&pw->context, pw_context_destroy);
+g_clear_pointer(&pw->thread_loop, pw_thread_loop_destroy);
 return NULL;
 }
 
-- 
2.40.1




[PATCH 06/12] audio/pw: trace during init before calling pipewire API

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
---
 audio/pwaudio.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index 6ca4ef4f62..2b12b40934 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -784,10 +784,11 @@ static void *
 qpw_audio_init(Audiodev *dev)
 {
 g_autofree pwaudio *pw = g_new0(pwaudio, 1);
-pw_init(NULL, NULL);
 
-trace_pw_audio_init();
 assert(dev->driver == AUDIODEV_DRIVER_PIPEWIRE);
+trace_pw_audio_init();
+
+pw_init(NULL, NULL);
 
 pw->dev = dev;
 pw->thread_loop = pw_thread_loop_new("PipeWire thread loop", NULL);
-- 
2.40.1




[PATCH 00/12] audio: pipewire backend improvements

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

Hi,

Here are a few patches to cover PipeWire support in the CI and other misc code
improvements.

Note: depends on libvirt-ci!396

thanks

Marc-André Lureau (12):
  libvirt-ci: update submodule to cover pipewire
  tests/lcitool: add pipewire
  audio/pw: Pipewire->PipeWire case fix for user-visible text
  audio/pw: drop needless case statement
  audio/pw: needless check for NULL
  audio/pw: trace during init before calling pipewire API
  audio/pw: add more details on error
  audio/pw: factorize some common code
  audio/pw: add more error reporting
  audio/pw: simplify error reporting in stream creation
  audio/pw: remove wrong comment
  audio/pw: improve channel position code

 meson.build   |   2 +-
 qapi/audio.json   |  12 +-
 audio/pwaudio.c   | 212 +++---
 audio/trace-events|   2 +-
 meson_options.txt |   2 +-
 qemu-options.hx   |   4 +-
 scripts/meson-buildoptions.sh |   2 +-
 tests/docker/dockerfiles/alpine.docker|   1 +
 tests/docker/dockerfiles/centos8.docker   |   1 +
 .../dockerfiles/debian-amd64-cross.docker |   1 +
 tests/docker/dockerfiles/debian-amd64.docker  |   1 +
 .../dockerfiles/debian-arm64-cross.docker |   1 +
 .../dockerfiles/debian-armel-cross.docker |   1 +
 .../dockerfiles/debian-armhf-cross.docker |   1 +
 .../dockerfiles/debian-mips64el-cross.docker  |   1 +
 .../dockerfiles/debian-mipsel-cross.docker|   1 +
 .../dockerfiles/debian-ppc64el-cross.docker   |   1 +
 .../dockerfiles/debian-s390x-cross.docker |   1 +
 tests/docker/dockerfiles/fedora.docker|   1 +
 tests/docker/dockerfiles/opensuse-leap.docker |   1 +
 tests/docker/dockerfiles/ubuntu2204.docker|   1 +
 tests/lcitool/libvirt-ci  |   2 +-
 tests/lcitool/projects/qemu.yml   |   1 +
 23 files changed, 105 insertions(+), 148 deletions(-)

-- 
2.40.1




[PATCH 07/12] audio/pw: add more details on error

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

PipeWire uses errno to report error details.

Signed-off-by: Marc-André Lureau 
---
 audio/pwaudio.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index 2b12b40934..d0bc4680a6 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -750,6 +750,7 @@ static int wait_resync(pwaudio *pw)
 }
 return 0;
 }
+
 static void
 on_core_error(void *data, uint32_t id, int seq, int res, const char *message)
 {
@@ -793,19 +794,19 @@ qpw_audio_init(Audiodev *dev)
 pw->dev = dev;
 pw->thread_loop = pw_thread_loop_new("PipeWire thread loop", NULL);
 if (pw->thread_loop == NULL) {
-error_report("Could not create PipeWire loop");
+error_report("Could not create PipeWire loop: %s", g_strerror(errno));
 goto fail;
 }
 
 pw->context =
 pw_context_new(pw_thread_loop_get_loop(pw->thread_loop), NULL, 0);
 if (pw->context == NULL) {
-error_report("Could not create PipeWire context");
+error_report("Could not create PipeWire context: %s", 
g_strerror(errno));
 goto fail;
 }
 
 if (pw_thread_loop_start(pw->thread_loop) < 0) {
-error_report("Could not start PipeWire loop");
+error_report("Could not start PipeWire loop: %s", g_strerror(errno));
 goto fail;
 }
 
-- 
2.40.1




[PATCH 09/12] audio/pw: add more error reporting

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
---
 audio/pwaudio.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index 67df53948c..5c706a9fde 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -429,6 +429,10 @@ create_stream(pwaudio *c, PWVoice *v, const char 
*stream_name,
 struct pw_properties *props;
 
 props = pw_properties_new(NULL, NULL);
+if (!props) {
+error_report("Failed to create PW properties: %s", g_strerror(errno));
+return -1;
+}
 
 /* 75% of the timer period for faster updates */
 buf_samples = (uint64_t)v->g->dev->timer_period * v->info.rate
@@ -441,8 +445,8 @@ create_stream(pwaudio *c, PWVoice *v, const char 
*stream_name,
 pw_properties_set(props, PW_KEY_TARGET_OBJECT, name);
 }
 v->stream = pw_stream_new(c->core, stream_name, props);
-
 if (v->stream == NULL) {
+error_report("Failed to create PW stream: %s", g_strerror(errno));
 return -1;
 }
 
@@ -470,6 +474,7 @@ create_stream(pwaudio *c, PWVoice *v, const char 
*stream_name,
 PW_STREAM_FLAG_MAP_BUFFERS |
 PW_STREAM_FLAG_RT_PROCESS, params, n_params);
 if (res < 0) {
+error_report("Failed to connect PW stream: %s", g_strerror(errno));
 pw_stream_destroy(v->stream);
 return -1;
 }
-- 
2.40.1




[PATCH 10/12] audio/pw: simplify error reporting in stream creation

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

create_stream() now reports on all error paths.

Signed-off-by: Marc-André Lureau 
---
 audio/pwaudio.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index 5c706a9fde..38905f5be2 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -486,8 +486,6 @@ static int
 qpw_stream_new(pwaudio *c, PWVoice *v, const char *stream_name,
const char *name, enum spa_direction dir)
 {
-int r;
-
 switch (v->info.channels) {
 case 8:
 v->info.position[0] = SPA_AUDIO_CHANNEL_FL;
@@ -540,13 +538,7 @@ qpw_stream_new(pwaudio *c, PWVoice *v, const char 
*stream_name,
 }
 
 /* create a new unconnected pwstream */
-r = create_stream(c, v, stream_name, name, dir);
-if (r < 0) {
-AUD_log(AUDIO_CAP, "Failed to create stream.");
-return -1;
-}
-
-return r;
+return create_stream(c, v, stream_name, name, dir);
 }
 
 static int
@@ -577,7 +569,6 @@ qpw_init_out(HWVoiceOut *hw, struct audsettings *as, void 
*drv_opaque)
 r = qpw_stream_new(c, v, ppdo->stream_name ? : c->dev->id,
ppdo->name, SPA_DIRECTION_OUTPUT);
 if (r < 0) {
-error_report("qpw_stream_new for playback failed");
 pw_thread_loop_unlock(c->thread_loop);
 return -1;
 }
@@ -621,7 +612,6 @@ qpw_init_in(HWVoiceIn *hw, struct audsettings *as, void 
*drv_opaque)
 r = qpw_stream_new(c, v, ppdo->stream_name ? : c->dev->id,
ppdo->name, SPA_DIRECTION_INPUT);
 if (r < 0) {
-error_report("qpw_stream_new for recording failed");
 pw_thread_loop_unlock(c->thread_loop);
 return -1;
 }
-- 
2.40.1




[PATCH 02/12] tests/lcitool: add pipewire

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
---
 tests/docker/dockerfiles/alpine.docker| 1 +
 tests/docker/dockerfiles/centos8.docker   | 1 +
 tests/docker/dockerfiles/debian-amd64-cross.docker| 1 +
 tests/docker/dockerfiles/debian-amd64.docker  | 1 +
 tests/docker/dockerfiles/debian-arm64-cross.docker| 1 +
 tests/docker/dockerfiles/debian-armel-cross.docker| 1 +
 tests/docker/dockerfiles/debian-armhf-cross.docker| 1 +
 tests/docker/dockerfiles/debian-mips64el-cross.docker | 1 +
 tests/docker/dockerfiles/debian-mipsel-cross.docker   | 1 +
 tests/docker/dockerfiles/debian-ppc64el-cross.docker  | 1 +
 tests/docker/dockerfiles/debian-s390x-cross.docker| 1 +
 tests/docker/dockerfiles/fedora.docker| 1 +
 tests/docker/dockerfiles/opensuse-leap.docker | 1 +
 tests/docker/dockerfiles/ubuntu2204.docker| 1 +
 tests/lcitool/projects/qemu.yml   | 1 +
 15 files changed, 15 insertions(+)

diff --git a/tests/docker/dockerfiles/alpine.docker 
b/tests/docker/dockerfiles/alpine.docker
index 81c70aeaf9..d47101e042 100644
--- a/tests/docker/dockerfiles/alpine.docker
+++ b/tests/docker/dockerfiles/alpine.docker
@@ -77,6 +77,7 @@ RUN apk update && \
 numactl-dev \
 openssh-client \
 pcre-dev \
+pipewire-dev \
 pixman-dev \
 pkgconf \
 pulseaudio-dev \
diff --git a/tests/docker/dockerfiles/centos8.docker 
b/tests/docker/dockerfiles/centos8.docker
index 1a6a9087c1..f7d46ebd9c 100644
--- a/tests/docker/dockerfiles/centos8.docker
+++ b/tests/docker/dockerfiles/centos8.docker
@@ -90,6 +90,7 @@ RUN dnf distro-sync -y && \
 openssh-clients \
 pam-devel \
 pcre-static \
+pipewire-devel \
 pixman-devel \
 pkgconfig \
 pulseaudio-libs-devel \
diff --git a/tests/docker/dockerfiles/debian-amd64-cross.docker 
b/tests/docker/dockerfiles/debian-amd64-cross.docker
index 2e7eb445f1..26109fe4d6 100644
--- a/tests/docker/dockerfiles/debian-amd64-cross.docker
+++ b/tests/docker/dockerfiles/debian-amd64-cross.docker
@@ -114,6 +114,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
   libnfs-dev:amd64 \
   libnuma-dev:amd64 \
   libpam0g-dev:amd64 \
+  libpipewire-0.3-dev:amd64 \
   libpixman-1-dev:amd64 \
   libpmem-dev:amd64 \
   libpng-dev:amd64 \
diff --git a/tests/docker/dockerfiles/debian-amd64.docker 
b/tests/docker/dockerfiles/debian-amd64.docker
index 28e2fa81b1..8ba1c13d8d 100644
--- a/tests/docker/dockerfiles/debian-amd64.docker
+++ b/tests/docker/dockerfiles/debian-amd64.docker
@@ -70,6 +70,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
   libnuma-dev \
   libpam0g-dev \
   libpcre2-dev \
+  libpipewire-0.3-dev \
   libpixman-1-dev \
   libpmem-dev \
   libpng-dev \
diff --git a/tests/docker/dockerfiles/debian-arm64-cross.docker 
b/tests/docker/dockerfiles/debian-arm64-cross.docker
index f558770f84..f560ed6044 100644
--- a/tests/docker/dockerfiles/debian-arm64-cross.docker
+++ b/tests/docker/dockerfiles/debian-arm64-cross.docker
@@ -114,6 +114,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
   libnfs-dev:arm64 \
   libnuma-dev:arm64 \
   libpam0g-dev:arm64 \
+  libpipewire-0.3-dev:arm64 \
   libpixman-1-dev:arm64 \
   libpng-dev:arm64 \
   libpulse-dev:arm64 \
diff --git a/tests/docker/dockerfiles/debian-armel-cross.docker 
b/tests/docker/dockerfiles/debian-armel-cross.docker
index f3d7e07cce..41f9f67417 100644
--- a/tests/docker/dockerfiles/debian-armel-cross.docker
+++ b/tests/docker/dockerfiles/debian-armel-cross.docker
@@ -114,6 +114,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
   libnfs-dev:armel \
   libnuma-dev:armel \
   libpam0g-dev:armel \
+  libpipewire-0.3-dev:armel \
   libpixman-1-dev:armel \
   libpng-dev:armel \
   libpulse-dev:armel \
diff --git a/tests/docker/dockerfiles/debian-armhf-cross.docker 
b/tests/docker/dockerfiles/debian-armhf-cross.docker
index 531c556ad5..1a095c6506 100644
--- a/tests/docker/dockerfiles/debian-armhf-cross.docker
+++ b/tests/docker/dockerfiles/debian-armhf-cross.docker
@@ -114,6 +114,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
   libnfs-dev:armhf \
   libnuma-dev:armhf \
   libpam0g-dev:armhf \
+  libpipewire-0.3-dev:armhf \
   libpixman-1-dev:armhf \
   libpng-dev:

[PATCH 03/12] audio/pw: Pipewire->PipeWire case fix for user-visible text

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

"PipeWire" is the correct case.

Signed-off-by: Marc-André Lureau 
---
 meson.build   |  2 +-
 qapi/audio.json   | 12 ++--
 audio/pwaudio.c   | 10 +-
 audio/trace-events|  2 +-
 meson_options.txt |  2 +-
 qemu-options.hx   |  4 ++--
 scripts/meson-buildoptions.sh |  2 +-
 7 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/meson.build b/meson.build
index 229eb585f7..4c44736bd4 100644
--- a/meson.build
+++ b/meson.build
@@ -3988,7 +3988,7 @@ if targetos == 'linux'
   summary_info += {'ALSA support':alsa}
   summary_info += {'PulseAudio support': pulse}
 endif
-summary_info += {'Pipewire support':   pipewire}
+summary_info += {'PipeWire support':   pipewire}
 summary_info += {'JACK support':  jack}
 summary_info += {'brlapi support':brlapi}
 summary_info += {'vde support':   vde}
diff --git a/qapi/audio.json b/qapi/audio.json
index e03396a7bc..b5c1af2b91 100644
--- a/qapi/audio.json
+++ b/qapi/audio.json
@@ -327,17 +327,17 @@
 ##
 # @AudiodevPipewirePerDirectionOptions:
 #
-# Options of the Pipewire backend that are used for both playback and
+# Options of the PipeWire backend that are used for both playback and
 # recording.
 #
 # @name: name of the sink/source to use
 #
-# @stream-name: name of the Pipewire stream created by qemu.  Can be
-#   used to identify the stream in Pipewire when you
-#   create multiple Pipewire devices or run multiple qemu
+# @stream-name: name of the PipeWire stream created by qemu.  Can be
+#   used to identify the stream in PipeWire when you
+#   create multiple PipeWire devices or run multiple qemu
 #   instances (default: audiodev's id)
 #
-# @latency: latency you want Pipewire to achieve in microseconds
+# @latency: latency you want PipeWire to achieve in microseconds
 #   (default 46000)
 #
 # Since: 8.1
@@ -352,7 +352,7 @@
 ##
 # @AudiodevPipewireOptions:
 #
-# Options of the Pipewire audio backend.
+# Options of the PipeWire audio backend.
 #
 # @in: options of the capture stream
 #
diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index 1d108bdebb..9eb69bfd18 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -1,5 +1,5 @@
 /*
- * QEMU Pipewire audio driver
+ * QEMU PipeWire audio driver
  *
  * Copyright (c) 2023 Red Hat Inc.
  *
@@ -800,21 +800,21 @@ qpw_audio_init(Audiodev *dev)
 assert(dev->driver == AUDIODEV_DRIVER_PIPEWIRE);
 
 pw->dev = dev;
-pw->thread_loop = pw_thread_loop_new("Pipewire thread loop", NULL);
+pw->thread_loop = pw_thread_loop_new("PipeWire thread loop", NULL);
 if (pw->thread_loop == NULL) {
-error_report("Could not create Pipewire loop");
+error_report("Could not create PipeWire loop");
 goto fail;
 }
 
 pw->context =
 pw_context_new(pw_thread_loop_get_loop(pw->thread_loop), NULL, 0);
 if (pw->context == NULL) {
-error_report("Could not create Pipewire context");
+error_report("Could not create PipeWire context");
 goto fail;
 }
 
 if (pw_thread_loop_start(pw->thread_loop) < 0) {
-error_report("Could not start Pipewire loop");
+error_report("Could not start PipeWire loop");
 goto fail;
 }
 
diff --git a/audio/trace-events b/audio/trace-events
index 85dbb506b2..ab04f020ce 100644
--- a/audio/trace-events
+++ b/audio/trace-events
@@ -24,7 +24,7 @@ pw_read(int32_t avail, uint32_t index, size_t len) "avail=%d 
index=%u len=%zu"
 pw_write(int32_t filled, int32_t avail, uint32_t index, size_t len) "filled=%d 
avail=%d index=%u len=%zu"
 pw_vol(const char *ret) "set volume: %s"
 pw_period(uint64_t quantum, uint32_t rate) "period =%" PRIu64 "/%u"
-pw_audio_init(void) "Initialize Pipewire context"
+pw_audio_init(void) "Initialize PipeWire context"
 
 # audio.c
 audio_timer_start(int interval) "interval %d ms"
diff --git a/meson_options.txt b/meson_options.txt
index ae2017702a..8dd786c1a4 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -256,7 +256,7 @@ option('oss', type: 'feature', value: 'auto',
 option('pa', type: 'feature', value: 'auto',
description: 'PulseAudio sound support')
 option('pipewire', type: 'feature', value: 'auto',
-   description: 'Pipewire sound support')
+   description: 'PipeWire sound support')
 option('sndio', type: 'feature', value: 'auto',
description: 'sndio sound support')
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 42b9094c10..be7317d455 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -963,10 +963,10 @@ SRST
 to honor this value but actual latencies may be lower or higher.
 
 ``-audiodev pipewire,id=id[,prop[=value][,...]]``
-Creates a backend using Pipewire. This backend is available on
+Creates a backend using PipeWire. This backend is available on
 most systems.
 
-Pipewire specific options are:
+PipeWire specific 

[PATCH 08/12] audio/pw: factorize some common code

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
---
 audio/pwaudio.c | 85 -
 1 file changed, 34 insertions(+), 51 deletions(-)

diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index d0bc4680a6..67df53948c 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -66,6 +66,9 @@ typedef struct PWVoiceIn {
 PWVoice v;
 } PWVoiceIn;
 
+#define PW_VOICE_IN(v) ((PWVoiceIn*)v)
+#define PW_VOICE_OUT(v) ((PWVoiceOut*)v)
+
 static void
 stream_destroy(void *data)
 {
@@ -630,62 +633,55 @@ qpw_init_in(HWVoiceIn *hw, struct audsettings *as, void 
*drv_opaque)
 }
 
 static void
-qpw_fini_out(HWVoiceOut *hw)
+qpw_voice_fini(PWVoice *v)
 {
-PWVoiceOut *pw = (PWVoiceOut *) hw;
-PWVoice *v = &pw->v;
+pwaudio *c = v->g;
 
-if (v->stream) {
-pwaudio *c = v->g;
-pw_thread_loop_lock(c->thread_loop);
-pw_stream_destroy(v->stream);
-v->stream = NULL;
-pw_thread_loop_unlock(c->thread_loop);
+if (!v->stream) {
+return;
 }
+pw_thread_loop_lock(c->thread_loop);
+pw_stream_destroy(v->stream);
+v->stream = NULL;
+pw_thread_loop_unlock(c->thread_loop);
 }
 
 static void
-qpw_fini_in(HWVoiceIn *hw)
+qpw_fini_out(HWVoiceOut *hw)
 {
-PWVoiceIn *pw = (PWVoiceIn *) hw;
-PWVoice *v = &pw->v;
+qpw_voice_fini(&PW_VOICE_OUT(hw)->v);
+}
 
-if (v->stream) {
-pwaudio *c = v->g;
-pw_thread_loop_lock(c->thread_loop);
-pw_stream_destroy(v->stream);
-v->stream = NULL;
-pw_thread_loop_unlock(c->thread_loop);
-}
+static void
+qpw_fini_in(HWVoiceIn *hw)
+{
+qpw_voice_fini(&PW_VOICE_IN(hw)->v);
 }
 
 static void
-qpw_enable_out(HWVoiceOut *hw, bool enable)
+qpw_voice_set_enabled(PWVoice *v, bool enable)
 {
-PWVoiceOut *po = (PWVoiceOut *) hw;
-PWVoice *v = &po->v;
 pwaudio *c = v->g;
 pw_thread_loop_lock(c->thread_loop);
 pw_stream_set_active(v->stream, enable);
 pw_thread_loop_unlock(c->thread_loop);
 }
 
+static void
+qpw_enable_out(HWVoiceOut *hw, bool enable)
+{
+qpw_voice_set_enabled(&PW_VOICE_OUT(hw)->v, enable);
+}
+
 static void
 qpw_enable_in(HWVoiceIn *hw, bool enable)
 {
-PWVoiceIn *pi = (PWVoiceIn *) hw;
-PWVoice *v = &pi->v;
-pwaudio *c = v->g;
-pw_thread_loop_lock(c->thread_loop);
-pw_stream_set_active(v->stream, enable);
-pw_thread_loop_unlock(c->thread_loop);
+qpw_voice_set_enabled(&PW_VOICE_IN(hw)->v, enable);
 }
 
 static void
-qpw_volume_out(HWVoiceOut *hw, Volume *vol)
+qpw_voice_set_volume(PWVoice *v, Volume *vol)
 {
-PWVoiceOut *pw = (PWVoiceOut *) hw;
-PWVoice *v = &pw->v;
 pwaudio *c = v->g;
 int i, ret;
 
@@ -707,28 +703,15 @@ qpw_volume_out(HWVoiceOut *hw, Volume *vol)
 }
 
 static void
-qpw_volume_in(HWVoiceIn *hw, Volume *vol)
+qpw_volume_out(HWVoiceOut *hw, Volume *vol)
 {
-PWVoiceIn *pw = (PWVoiceIn *) hw;
-PWVoice *v = &pw->v;
-pwaudio *c = v->g;
-int i, ret;
-
-pw_thread_loop_lock(c->thread_loop);
-v->volume.channels = vol->channels;
-
-for (i = 0; i < vol->channels; ++i) {
-v->volume.values[i] = (float)vol->vol[i] / 255;
-}
-
-ret = pw_stream_set_control(v->stream,
-SPA_PROP_channelVolumes, v->volume.channels, v->volume.values, 0);
-trace_pw_vol(ret == 0 ? "success" : "failed");
+qpw_voice_set_volume(&PW_VOICE_OUT(hw)->v, vol);
+}
 
-v->muted = vol->mute;
-float val = v->muted ? 1.f : 0.f;
-ret = pw_stream_set_control(v->stream, SPA_PROP_mute, 1, &val, 0);
-pw_thread_loop_unlock(c->thread_loop);
+static void
+qpw_volume_in(HWVoiceIn *hw, Volume *vol)
+{
+qpw_voice_set_volume(&PW_VOICE_IN(hw)->v, vol);
 }
 
 static int wait_resync(pwaudio *pw)
-- 
2.40.1




[PATCH 11/12] audio/pw: remove wrong comment

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

The stream is actually created connected.

Signed-off-by: Marc-André Lureau 
---
 audio/pwaudio.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index 38905f5be2..f74d506ec6 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -537,7 +537,6 @@ qpw_stream_new(pwaudio *c, PWVoice *v, const char 
*stream_name,
 break;
 }
 
-/* create a new unconnected pwstream */
 return create_stream(c, v, stream_name, name, dir);
 }
 
-- 
2.40.1




[PATCH 04/12] audio/pw: drop needless case statement

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
---
 audio/pwaudio.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index 9eb69bfd18..51cfc0b052 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -197,16 +197,6 @@ on_stream_state_changed(void *data, enum pw_stream_state 
old,
 
 trace_pw_state_changed(pw_stream_get_node_id(v->stream),
pw_stream_state_as_string(state));
-
-switch (state) {
-case PW_STREAM_STATE_ERROR:
-case PW_STREAM_STATE_UNCONNECTED:
-break;
-case PW_STREAM_STATE_PAUSED:
-case PW_STREAM_STATE_CONNECTING:
-case PW_STREAM_STATE_STREAMING:
-break;
-}
 }
 
 static const struct pw_stream_events capture_stream_events = {
-- 
2.40.1




[PATCH 12/12] audio/pw: improve channel position code

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

Follow PulseAudio backend comment and code, and only implement the
channels QEMU actually supports at this point, and add the same comment
about limits and future mappings. Simplify a bit the code.

Signed-off-by: Marc-André Lureau 
---
 audio/pwaudio.c | 75 +
 1 file changed, 26 insertions(+), 49 deletions(-)

diff --git a/audio/pwaudio.c b/audio/pwaudio.c
index f74d506ec6..062610a704 100644
--- a/audio/pwaudio.c
+++ b/audio/pwaudio.c
@@ -417,8 +417,8 @@ pw_to_audfmt(enum spa_audio_format fmt, int *endianness,
 }
 
 static int
-create_stream(pwaudio *c, PWVoice *v, const char *stream_name,
-  const char *name, enum spa_direction dir)
+qpw_stream_new(pwaudio *c, PWVoice *v, const char *stream_name,
+   const char *name, enum spa_direction dir)
 {
 int res;
 uint32_t n_params;
@@ -482,62 +482,37 @@ create_stream(pwaudio *c, PWVoice *v, const char 
*stream_name,
 return 0;
 }
 
-static int
-qpw_stream_new(pwaudio *c, PWVoice *v, const char *stream_name,
-   const char *name, enum spa_direction dir)
+static void
+qpw_set_position(uint32_t channels, uint32_t position[SPA_AUDIO_MAX_CHANNELS])
 {
-switch (v->info.channels) {
+memcpy(position, (uint32_t[SPA_AUDIO_MAX_CHANNELS]) { 
SPA_AUDIO_CHANNEL_UNKNOWN, },
+   sizeof(uint32_t) * SPA_AUDIO_MAX_CHANNELS);
+/*
+ * TODO: This currently expects the only frontend supporting more than 2
+ * channels is the usb-audio.  We will need some means to set channel
+ * order when a new frontend gains multi-channel support.
+ */
+switch (channels) {
 case 8:
-v->info.position[0] = SPA_AUDIO_CHANNEL_FL;
-v->info.position[1] = SPA_AUDIO_CHANNEL_FR;
-v->info.position[2] = SPA_AUDIO_CHANNEL_FC;
-v->info.position[3] = SPA_AUDIO_CHANNEL_LFE;
-v->info.position[4] = SPA_AUDIO_CHANNEL_RL;
-v->info.position[5] = SPA_AUDIO_CHANNEL_RR;
-v->info.position[6] = SPA_AUDIO_CHANNEL_SL;
-v->info.position[7] = SPA_AUDIO_CHANNEL_SR;
-break;
+position[6] = SPA_AUDIO_CHANNEL_SL;
+position[7] = SPA_AUDIO_CHANNEL_SR;
+/* fallthrough */
 case 6:
-v->info.position[0] = SPA_AUDIO_CHANNEL_FL;
-v->info.position[1] = SPA_AUDIO_CHANNEL_FR;
-v->info.position[2] = SPA_AUDIO_CHANNEL_FC;
-v->info.position[3] = SPA_AUDIO_CHANNEL_LFE;
-v->info.position[4] = SPA_AUDIO_CHANNEL_RL;
-v->info.position[5] = SPA_AUDIO_CHANNEL_RR;
-break;
-case 5:
-v->info.position[0] = SPA_AUDIO_CHANNEL_FL;
-v->info.position[1] = SPA_AUDIO_CHANNEL_FR;
-v->info.position[2] = SPA_AUDIO_CHANNEL_FC;
-v->info.position[3] = SPA_AUDIO_CHANNEL_LFE;
-v->info.position[4] = SPA_AUDIO_CHANNEL_RC;
-break;
-case 4:
-v->info.position[0] = SPA_AUDIO_CHANNEL_FL;
-v->info.position[1] = SPA_AUDIO_CHANNEL_FR;
-v->info.position[2] = SPA_AUDIO_CHANNEL_FC;
-v->info.position[3] = SPA_AUDIO_CHANNEL_RC;
-break;
-case 3:
-v->info.position[0] = SPA_AUDIO_CHANNEL_FL;
-v->info.position[1] = SPA_AUDIO_CHANNEL_FR;
-v->info.position[2] = SPA_AUDIO_CHANNEL_LFE;
-break;
+position[2] = SPA_AUDIO_CHANNEL_FC;
+position[3] = SPA_AUDIO_CHANNEL_LFE;
+position[4] = SPA_AUDIO_CHANNEL_RL;
+position[5] = SPA_AUDIO_CHANNEL_RR;
+/* fallthrough */
 case 2:
-v->info.position[0] = SPA_AUDIO_CHANNEL_FL;
-v->info.position[1] = SPA_AUDIO_CHANNEL_FR;
+position[0] = SPA_AUDIO_CHANNEL_FL;
+position[1] = SPA_AUDIO_CHANNEL_FR;
 break;
 case 1:
-v->info.position[0] = SPA_AUDIO_CHANNEL_MONO;
+position[0] = SPA_AUDIO_CHANNEL_MONO;
 break;
 default:
-for (size_t i = 0; i < v->info.channels; i++) {
-v->info.position[i] = SPA_AUDIO_CHANNEL_UNKNOWN;
-}
-break;
+dolog("Internal error: unsupported channel count %d\n", channels);
 }
-
-return create_stream(c, v, stream_name, name, dir);
 }
 
 static int
@@ -555,6 +530,7 @@ qpw_init_out(HWVoiceOut *hw, struct audsettings *as, void 
*drv_opaque)
 
 v->info.format = audfmt_to_pw(as->fmt, as->endianness);
 v->info.channels = as->nchannels;
+qpw_set_position(as->nchannels, v->info.position);
 v->info.rate = as->freq;
 
 obt_as.fmt =
@@ -601,6 +577,7 @@ qpw_init_in(HWVoiceIn *hw, struct audsettings *as, void 
*drv_opaque)
 
 v->info.format = audfmt_to_pw(as->fmt, as->endianness);
 v->info.channels = as->nchannels;
+qpw_set_position(as->nchannels, v->info.position);
 v->info.rate = as->freq;
 
 obt_as.fmt =
-- 
2.40.1




[PATCH 01/12] libvirt-ci: update submodule to cover pipewire

2023-05-06 Thread marcandre . lureau
From: Marc-André Lureau 

List of upstream changes:

Abdulwasiu Apalowo (6):
  commandline: add default tag information to image argument
  containers: add tag parameter to image_exists method
  lcitool: edit error message during container run (or shell) operation.
  containers: change the mode bits of --script argument
  containers: mount temporary directory to user's home in the container
  containers: always change workdir to the user's home

Ani Sinha (1):
  mappings: add new package mappings for mformat and xorriso

Erik Skultety (17):
  docs: mappings: Add a section on the preferred mapping naming scheme
  facts: projects: nbdkit: Replace zstd mapping with libzstd
  facts: mappings: Drop 'zstd' mapping
  facts: targets: Add Fedora 38
  gitlab-ci.yml: Add Fedora 38 target
  facts: targets: Drop Fedora 36 target
  Add a pytest.ini
  tests: commands: Consolidate the installed package/run from git tests
  Add tox.ini configuration file
  test-requirements: Rename to dev-requirements.txt
  requirements: Add tox to dev-requirements.txt and drop pytest and flake
  dev-requirements: Reference VM requirements
  gitignore: Add the default .tox directory
  tox: Allow running with custom pytest options with {posargs}
  gitlab-ci.yml: Start using tox for testing
  .gitlab-ci.yml: Always test against installed lcitool
  docs: testing: Update contents with tox

Marc-André Lureau (1):
  facts/mappings & qemu: add pipewire

Signed-off-by: Marc-André Lureau 
---
 tests/lcitool/libvirt-ci | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/lcitool/libvirt-ci b/tests/lcitool/libvirt-ci
index 85487e1404..2e0571c3e0 16
--- a/tests/lcitool/libvirt-ci
+++ b/tests/lcitool/libvirt-ci
@@ -1 +1 @@
-Subproject commit 85487e140415b2ac54b01a9a6b600fd7c21edc2f
+Subproject commit 2e0571c3e0722c79b90decb2c7fd9fa1deebbd46
-- 
2.40.1




[PATCH] sbsa-ref: switch default cpu core to Neoverse-N1

2023-05-06 Thread Marcin Juszkiewicz
The world outside moves to newer and newer cpu cores. Let move SBSA
Reference Platform to something newer as well.

Signed-off-by: Marcin Juszkiewicz 
---
 hw/arm/sbsa-ref.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 0b93558dde..a1562f944a 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -852,7 +852,7 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
 
 mc->init = sbsa_ref_init;
 mc->desc = "QEMU 'SBSA Reference' ARM Virtual Machine";
-mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a57");
+mc->default_cpu_type = ARM_CPU_TYPE_NAME("neoverse-n1");
 mc->max_cpus = 512;
 mc->pci_allow_0_address = true;
 mc->minimum_page_bits = 12;
-- 
2.39.2




Re: [PATCH] virtio-net: not enable vq reset feature unconditionally

2023-05-06 Thread Michael S. Tsirkin
On Sat, May 06, 2023 at 10:13:36AM +0800, Xuan Zhuo wrote:
> On Thu,  4 May 2023 12:14:47 +0200, =?utf-8?q?Eugenio_P=C3=A9rez?= 
>  wrote:
> > The commit 93a97dc5200a ("virtio-net: enable vq reset feature") enables
> > unconditionally vq reset feature as long as the device is emulated.
> > This makes impossible to actually disable the feature, and it causes
> > migration problems from qemu version previous than 7.2.
> >
> > The entire final commit is unneeded as device system already enable or
> > disable the feature properly.
> >
> > This reverts commit 93a97dc5200a95e63b99cb625f20b7ae802ba413.
> > Fixes: 93a97dc5200a ("virtio-net: enable vq reset feature")
> > Signed-off-by: Eugenio Pérez 
> >
> > ---
> > Tested by checking feature bit at  /sys/devices/pci.../virtio0/features
> > enabling and disabling queue_reset virtio-net feature and vhost=on/off
> > on net device backend.
> 
> Do you mean that this feature cannot be closed?
> 
> I tried to close in the guest, it was successful.
> 
> In addition, in this case, could you try to repair the problem instead of
> directly revert.
> 
> Thanks.

What does you patch accomplish though? If it's not needed
let's not do it.

> > ---
> >  hw/net/virtio-net.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 53e1c32643..4ea33b6e2e 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -805,7 +805,6 @@ static uint64_t virtio_net_get_features(VirtIODevice 
> > *vdev, uint64_t features,
> >  }
> >
> >  if (!get_vhost_net(nc->peer)) {
> > -virtio_add_feature(&features, VIRTIO_F_RING_RESET);
> >  return features;
> >  }
> >
> > --
> > 2.31.1
> >




Re: [PATCH v3 6/6] Hexagon (linux-user/hexagon): handle breakpoints

2023-05-06 Thread Richard Henderson

On 5/4/23 16:37, Matheus Tavares Bernardino wrote:

This enables LLDB to work with hexagon linux-user mode through the GDB
remote protocol.

Helped-by: Richard Henderson
Signed-off-by: Matheus Tavares Bernardino
---
  linux-user/hexagon/cpu_loop.c | 3 +++
  1 file changed, 3 insertions(+)


Reviewed-by: Richard Henderson 

r~