Re: [PATCH v2 21/46] target/loongarch: Implement xvsigncov

2023-07-08 Thread Richard Henderson

On 6/30/23 08:58, Song Gao wrote:

This patch includes:
- XVSIGNCOV.{B/H/W/D}.

Signed-off-by: Song Gao
---
  target/loongarch/disas.c | 5 +
  target/loongarch/insn_trans/trans_lasx.c.inc | 5 +
  target/loongarch/insns.decode| 5 +
  target/loongarch/vec.h   | 2 ++
  target/loongarch/vec_helper.c| 2 --
  5 files changed, 17 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 22/46] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz

2023-07-08 Thread Richard Henderson

On 6/30/23 08:58, Song Gao wrote:

-void HELPER(vmskltz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vmskltz_b)(CPULoongArchState *env,
+   uint32_t oprsz, uint32_t vd, uint32_t vj)
  {
-uint16_t temp = 0;
+int i, max;
+uint16_t temp;
  VReg *Vd = &(env->fpr[vd].vreg);
  VReg *Vj = &(env->fpr[vj].vreg);
  
-temp = do_vmskltz_b(Vj->D(0));

-temp |= (do_vmskltz_b(Vj->D(1)) << 8);
-Vd->D(0) = temp;
-Vd->D(1) = 0;
+max = (oprsz == 16) ? 1 : 2;
+
+for (i = 0; i < max; i++) {
+temp = 0;
+temp = do_vmskltz_b(Vj->D(2 * i));


void * and desc operands; loop over oprsz.


r~



Re: [PATCH] chore: rename `tricore_feature` to `is_tricore_feature_enabled`

2023-07-08 Thread Bastian Koppelmann
Hi Rui,

On Thu, Jul 06, 2023 at 12:59:55PM -0400, Rui Chen wrote:
> While upgrading capstone to v5, there was some name clash with the
> tricore_feature in capstone (which was introduced in this PR), thus rename
> tricore_feature to is_tricore_feature_enabled.
> 
> Build error log is below
> 
> /opt/homebrew/Cellar/capstone/5.0/include/capstone/tricore.h:561:3: error:
> redefinition of 'tricore_feature' as different kind of symbol
> } tricore_feature;
>   ^
> ../target/tricore/cpu.h:261:19: note: previous definition is here
> static inline int tricore_feature(CPUTriCoreState *env, int feature)
>                   ^
> 1 error generated.

I ran into the same problem when trying out capstone. I think a better name
would be tricore_has_feature() to match has_feature() in
target/tricore/translate.c.

P.S. if you CC me it helps my mail filter to find your patch :). Also we have
the rule for qemu-devel to not send a patch as a attachment. See
(https://www.qemu.org/docs/master/devel/submitting-a-patch.html)

Cheers,
Bastian



Re: [PATCH v2 23/46] target/loognarch: Implement xvldi

2023-07-08 Thread Richard Henderson

On 6/30/23 08:58, Song Gao wrote:

This patch includes:
- XVLDI.

Signed-off-by: Song Gao
---
  target/loongarch/disas.c | 7 +++
  target/loongarch/insn_trans/trans_lasx.c.inc | 2 ++
  target/loongarch/insn_trans/trans_lsx.c.inc  | 6 --
  target/loongarch/insns.decode| 2 ++
  4 files changed, 15 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 24/46] target/loongarch: Implement LASX logic instructions

2023-07-08 Thread Richard Henderson

On 6/30/23 08:58, Song Gao wrote:

+len = (simd_oprsz(v) == 16) ? LSX_LEN : LASX_LEN;


Use simd_oprsz directly, without the rest of the computation.


r~



Re: [PATCH v2 25/46] target/loongarch: Implement xvsll xvsrl xvsra xvrotr

2023-07-08 Thread Richard Henderson

On 6/30/23 08:58, Song Gao wrote:

This patch includes:
- XVSLL[I].{B/H/W/D};
- XVSRL[I].{B/H/W/D};
- XVSRA[I].{B/H/W/D};
- XVROTR[I].{B/H/W/D}.

Signed-off-by: Song Gao
---
  target/loongarch/disas.c | 36 
  target/loongarch/insn_trans/trans_lasx.c.inc | 36 
  target/loongarch/insns.decode| 33 ++
  3 files changed, 105 insertions(+)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 26/46] target/loongarch: Implement xvsllwil xvextl

2023-07-08 Thread Richard Henderson

On 6/30/23 08:58, Song Gao wrote:

+#define VSLLWIL(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t oprsz,  \
+  uint32_t vd, uint32_t vj, uint32_t imm)  \
+{  \
+int i, max;\
+VReg temp; \
+VReg *Vd = &(env->fpr[vd].vreg);   \
+VReg *Vj = &(env->fpr[vj].vreg);   \
+typedef __typeof(temp.E1(0)) TD;   \
+   \
+temp.Q(0) = int128_zero(); \
+   \
+if (oprsz == 32) { \
+temp.Q(1) = int128_zero(); \
+}  \
+   \
+max = LSX_LEN / BIT;   \
+for (i = 0; i < max; i++) {\
+temp.E1(i) = (TD)Vj->E2(i) << (imm % BIT); \
+if (oprsz == 32) { \
+temp.E1(i + max) = (TD)Vj->E2(i + max * 2) << (imm % BIT); \
+}  \
+}  \
+*Vd = temp;\
+}


Function parameters using void* and desc.

VReg temp = { };

instead of conditional partial assignment.

Fix iteration, as previously discussed.


r~



Re: [PATCH v2 20/46] target/loongarch: Implement vext2xv

2023-07-08 Thread Song Gao

Hi, Richard

在 2023/7/8 上午5:19, Richard Henderson 写道:

On 6/30/23 08:58, Song Gao wrote:

+#define VEXT2XV(NAME, BIT, E1, E2)    \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t oprsz, \
+  uint32_t vd, uint32_t vj)   \
+{ \
+    int i;    \
+    VReg *Vd = &(env->fpr[vd].vreg);  \
+    VReg *Vj = &(env->fpr[vj].vreg);  \
+    VReg temp;    \
+  \
+    for (i = 0; i < LASX_LEN / BIT; i++) {    \
+    temp.E1(i) = Vj->E2(i);   \
+    } \
+    *Vd = temp;   \
+}


So unlike VEXT(H), this does compress in order?

Yes.


Anyway, function signature and iteration without LASX_LEN.
Isn't there a 128-bit helper to merge this with?


There is no similar 128 bit instructions.

Thanks.
Song Gao




[PATCH v3 2/2] target/riscv: Optimize ambiguous local variable in pmp_hart_has_privs

2023-07-08 Thread Ruibo Lu
These two values represents whether start/end address is in pmp_range.
However, the type and name of them is ambiguous. This commit change the
name and type of them to improve code readability and accuracy.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1735
Reviewed-by: Weiwei Li 
Reviewed-by: Philippe Mathieu-Daudé  
Signed-off-by: Ruibo Lu 
---
 target/riscv/pmp.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 1a9279ba88..ea3d29217a 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -203,16 +203,16 @@ void pmp_update_rule_nums(CPURISCVState *env)
 }
 }
 
-static int pmp_is_in_range(CPURISCVState *env, int pmp_index,
-   target_ulong addr)
+static bool pmp_is_in_range(CPURISCVState *env, int pmp_index,
+target_ulong addr)
 {
-int result = 0;
+bool result = false;
 
 if ((addr >= env->pmp_state.addr[pmp_index].sa) &&
 (addr <= env->pmp_state.addr[pmp_index].ea)) {
-result = 1;
+result = true;
 } else {
-result = 0;
+result = false;
 }
 
 return result;
@@ -287,8 +287,8 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
 {
 int i = 0;
 int pmp_size = 0;
-target_ulong s = 0;
-target_ulong e = 0;
+bool sa_in = false;
+bool ea_in = false;
 
 /* Short cut if no rules */
 if (0 == pmp_get_num_rules(env)) {
@@ -314,11 +314,11 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
  * from low to high
  */
 for (i = 0; i < MAX_RISCV_PMPS; i++) {
-s = pmp_is_in_range(env, i, addr);
-e = pmp_is_in_range(env, i, addr + pmp_size - 1);
+sa_in = pmp_is_in_range(env, i, addr);
+ea_in = pmp_is_in_range(env, i, addr + pmp_size - 1);
 
 /* partially inside */
-if ((s + e) == 1) {
+if (sa_in ^ ea_in) {
 qemu_log_mask(LOG_GUEST_ERROR,
   "pmp violation - access is partially inside\n");
 *allowed_privs = 0;
@@ -339,7 +339,7 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
 (env->pmp_state.pmp[i].cfg_reg & PMP_WRITE) |
 ((env->pmp_state.pmp[i].cfg_reg & PMP_EXEC) >> 2);
 
-if (((s + e) == 2) && (PMP_AMATCH_OFF != a_field)) {
+if (sa_in && ea_in && (PMP_AMATCH_OFF != a_field)) {
 /*
  * If the PMP entry is not off and the address is in range,
  * do the priv check
-- 
2.41.0




[PATCH v3 0/2] target/riscv: improve code accuracy and

2023-07-08 Thread Ruibo Lu
I'm so sorry. As a newcomer, I'm not familiar with the patch mechanism. I 
mistakenly added the reviewer's "Reviewed-by" line into the wrong commit, So I 
have resent this patchset


Changes in v3:
* fix the allignment of pmp_is_in_range parameter line

Changes in v2:
* change the initial values of sa_in and ea_in to false
* change the condition expression when address area fully in range

Ruibo Lu (2):
  target/riscv: Remove redundant check in pmp_is_locked
  target/riscv: Optimize ambiguous local variable in pmp_hart_has_privs

 target/riscv/pmp.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

-- 
2.41.0




[PATCH v3 1/2] target/riscv: Remove redundant check in pmp_is_locked

2023-07-08 Thread Ruibo Lu
the check of top PMP is redundant and will not influence the return
value, so consider remove it

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1733
Reviewed-by: Weiwei Li 
Reviewed-by: Alistair Francis 
Signed-off-by: Ruibo Lu 
---
 target/riscv/pmp.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 9d8db493e6..1a9279ba88 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -49,11 +49,6 @@ static inline int pmp_is_locked(CPURISCVState *env, uint32_t 
pmp_index)
 return 1;
 }
 
-/* Top PMP has no 'next' to check */
-if ((pmp_index + 1u) >= MAX_RISCV_PMPS) {
-return 0;
-}
-
 return 0;
 }
 
-- 
2.41.0




Re: Reducing vdpa migration downtime because of memory pin / maps

2023-07-08 Thread Si-Wei Liu




On 7/5/2023 10:46 PM, Eugenio Perez Martin wrote:

On Thu, Jul 6, 2023 at 2:13 AM Si-Wei Liu  wrote:



On 7/5/2023 11:03 AM, Eugenio Perez Martin wrote:

On Tue, Jun 27, 2023 at 8:36 AM Si-Wei Liu  wrote:


On 6/9/2023 7:32 AM, Eugenio Perez Martin wrote:

On Fri, Jun 9, 2023 at 12:39 AM Si-Wei Liu  wrote:

On 6/7/23 01:08, Eugenio Perez Martin wrote:

On Wed, Jun 7, 2023 at 12:43 AM Si-Wei Liu  wrote:

Sorry for reviving this old thread, I lost the best timing to follow up
on this while I was on vacation. I have been working on this and found
out some discrepancy, please see below.

On 4/5/23 04:37, Eugenio Perez Martin wrote:

Hi!

As mentioned in the last upstream virtio-networking meeting, one of
the factors that adds more downtime to migration is the handling of
the guest memory (pin, map, etc). At this moment this handling is
bound to the virtio life cycle (DRIVER_OK, RESET). In that sense, the
destination device waits until all the guest memory / state is
migrated to start pinning all the memory.

The proposal is to bind it to the char device life cycle (open vs
close),

Hmmm, really? If it's the life cycle for char device, the next guest /
qemu launch on the same vhost-vdpa device node won't make it work.


Maybe my sentence was not accurate, but I think we're on the same page here.

Two qemu instances opening the same char device at the same time are
not allowed, and vhost_vdpa_release clean all the maps. So the next
qemu that opens the char device should see a clean device anyway.

I mean the pin can't be done at the time of char device open, where the
user address space is not known/bound yet. The earliest point possible
for pinning would be until the vhost_attach_mm() call from SET_OWNER is
done.

Maybe we are deviating, let me start again.

Using QEMU code, what I'm proposing is to modify the lifecycle of the
.listener member of struct vhost_vdpa.

At this moment, the memory listener is registered at
vhost_vdpa_dev_start(dev, started=true) call for the last vhost_dev,
and is unregistered in both vhost_vdpa_reset_status and
vhost_vdpa_cleanup.

My original proposal was just to move the memory listener registration
to the last vhost_vdpa_init, and remove the unregister from
vhost_vdpa_reset_status. The calls to vhost_vdpa_dma_map/unmap would
be the same, the device should not realize this change.

This can address LM downtime latency for sure, but it won't help
downtime during dynamic SVQ switch - which still needs to go through the
full unmap/map cycle (that includes the slow part for pinning) from
passthrough to SVQ mode. Be noted not every device could work with a
separate ASID for SVQ descriptors. The fix should expect to work on
normal vDPA vendor devices without a separate descriptor ASID, with
platform IOMMU underneath or with on-chip IOMMU.


At this moment the SVQ switch is very inefficient mapping-wise, as it
unmap all the GPA->HVA maps and overrides it. In particular, SVQ is
allocated in low regions of the iova space, and then the guest memory
is allocated in this new IOVA region incrementally.

Yep. The key to build this fast path for SVQ switching I think is to
maintain the identity mapping for the passthrough queues so that QEMU
can reuse the old mappings for guest memory (e.g. GIOVA identity mapped
to GPA) while incrementally adding new mappings for SVQ vrings.


We can optimize that if we place SVQ in a free GPA area instead.

Here's a question though: it might not be hard to find a free GPA range
for the non-vIOMMU case (allocate iova from beyond the 48bit or 52bit
ranges), but I'm not sure if easy to find a free GIOVA range for the
vIOMMU case - particularly this has to work in the same entire 64bit
IOVA address ranges that (for now) QEMU won't be able to "reserve" a
specific IOVA ranges for SVQ from the vIOMMU. Do you foresee this can be
done for every QEMU emulated vIOMMU (intel-iommu amd-iommu, arm smmu and
virito-iommu) so that we can call it out as a generic means for SVQ
switching optimization?


In the case vIOMMU allocates a new block we will use the same algorithm as now:
* Find a new free IOVA chunk of the same size
* Map this new SVQ IOVA, that may or may not be the same as SVQ

Since we must go through the translation phase to sanitize guest's
available descriptors anyway, it has zero added cost.
Not sure I followed, this can work but doesn't seem able to reuse the 
old host kernel mappings for guest memory, hence still requires remap of 
the entire host IOVA ranges when SVQ IOVA comes along. I think by 
maintaining 1:1 identity map on guest memory, we don't have to bother 
tearing down existing HVA->HPA mappings in kernel thus save the 
expensive pinning calls at large. I don't clearly see under this scheme, 
how the new SVQ IOVA may work with potential conflict on IOVA space from 
hotplugged memory - in this case the 1:1 IOVA->GPA identity guest memory 
mapping can't be kept.



Another option would be to move the SVQ vring to a new region, but I
don't see an

[PATCH v3 0/2] Vhost-vdpa Shadow Virtqueue _F_CTRL_RX_EXTRA commands support

2023-07-08 Thread Hawkins Jiawei
This series enables shadowed CVQ to intercept rx commands related to
VIRTIO_NET_F_CTRL_RX_EXTRA feature through shadowed CVQ, update the virtio
NIC device model so qemu send it in a migration, and the restore of
that rx state in the destination.

To test this patch series, one should modify the `n->parent_obj.guest_features`
value in vhost_vdpa_net_load_rx() using gdb, as the linux virtio-net
driver does not currently support the VIRTIO_NET_F_CTRL_RX_EXTRA
feature.

Note that this patch should be based on
[1] patch "Vhost-vdpa Shadow Virtqueue _F_CTRL_RX commands support"

[1]. https://lore.kernel.org/all/cover.1688743107.git.yin31...@gmail.com/

TestStep

1. test the patch series using vp-vdpa device

  - For L0 guest, boot QEMU with virtio-net-pci net device with
`ctrl_vq`, `ctrl_rx` and `ctrl_rx_extra` feature on, something like:
  -device virtio-net-pci,rx_queue_size=256,tx_queue_size=256,
iommu_platform=on,ctrl_vq=on,ctrl_rx=on,ctrl_rx_extra=on...

  - For L1 guest, apply the patch series and compile the code,
start QEMU with vdpa device with svq mode and enable the
`ctrl_vq`, `ctrl_rx` and `ctrl_rx_extra` feature on, something like:
  -netdev type=vhost-vdpa,x-svq=true,...
  -device virtio-net-pci,ctrl_vq=on,ctrl_rx=on,ctrl_rx_extra=on...
Use gdb to attach the VM and break at the net/vhost-vdpa.c:870.

With this series, gdb can hit the breakpoint. Enable the
VIRTIO_NET_F_CTRL_RX_EXTRA feature and enable the non-unicast mode
by entering the following gdb commands:
```gdb
set n->parent_obj.guest_features |= (1 << 20)
set n->nouni = 1
c
```
QEMU should not trigger any errors or warnings.

Without this series, QEMU should fail with
"x-svq=true: vdpa svq does not work with features 0x10".

ChangeLog
=
v3:
  - return early if mismatch the condition suggested by Eugenio in
patch 1 "vdpa: Restore packet receive filtering state relative with
_F_CTRL_RX_EXTRA feature"
  - remove the `on` variable suggested by Eugenio in patch 1 "vdpa:
Restore packet receive filtering state relative with
_F_CTRL_RX_EXTRA feature"

v2: https://lore.kernel.org/all/cover.1688365324.git.yin31...@gmail.com/
  - avoid sending CVQ command in default state suggested by Eugenio

v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg04956.html

Hawkins Jiawei (2):
  vdpa: Restore packet receive filtering state relative with
_F_CTRL_RX_EXTRA feature
  vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ

 net/vhost-vdpa.c | 89 
 1 file changed, 89 insertions(+)

-- 
2.25.1




Re: [PATCH v7 12/15] target/riscv: Add Zvkg ISA extension support

2023-07-08 Thread Daniel Henrique Barboza

Hi,

This patch breaks some gitlab runners because of this:

On 7/2/23 12:53, Max Chou wrote:

From: Nazar Kazakov 

This commit adds support for the Zvkg vector-crypto extension, which
consists of the following instructions:

* vgmul.vv
* vghsh.vv

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Lawrence Hunter 
[max.c...@sifive.com: Replaced vstart checking by TCG op]
Signed-off-by: Lawrence Hunter 
Signed-off-by: Nazar Kazakov 
Signed-off-by: Max Chou 
Reviewed-by: Daniel Henrique Barboza 
[max.c...@sifive.com: Exposed x-zvkg property]
---
  target/riscv/cpu.c   |  6 +-
  target/riscv/cpu_cfg.h   |  1 +
  target/riscv/helper.h|  3 +
  target/riscv/insn32.decode   |  4 ++
  target/riscv/insn_trans/trans_rvvk.c.inc | 30 ++
  target/riscv/vcrypto_helper.c| 72 
  6 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 08b8355f52..699ab5e9fa 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -118,6 +118,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zve64d, PRIV_VERSION_1_10_0, ext_zve64d),
  ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
  ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
+ISA_EXT_DATA_ENTRY(zvkg, PRIV_VERSION_1_12_0, ext_zvkg),
  ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
  ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
  ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
@@ -1194,8 +1195,8 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
   * In principle Zve*x would also suffice here, were they supported
   * in qemu
   */
-if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha ||
- cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
+if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkg || cpu->cfg.ext_zvkned ||
+ cpu->cfg.ext_zvknha || cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
  error_setg(errp,
 "Vector crypto extensions require V or Zve* extensions");
  return;
@@ -1710,6 +1711,7 @@ static Property riscv_cpu_extensions[] = {
  /* Vector cryptography extensions */
  DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false),
  DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
+DEFINE_PROP_BOOL("x-zvkg", RISCVCPU, cfg.ext_zvkg, false),
  DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false),
  DEFINE_PROP_BOOL("x-zvknha", RISCVCPU, cfg.ext_zvknha, false),
  DEFINE_PROP_BOOL("x-zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 27062b12a8..960761c479 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -85,6 +85,7 @@ struct RISCVCPUConfig {
  bool ext_zve64d;
  bool ext_zvbb;
  bool ext_zvbc;
+bool ext_zvkg;
  bool ext_zvkned;
  bool ext_zvknha;
  bool ext_zvknhb;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 172c91c65c..238343cb42 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1244,3 +1244,6 @@ DEF_HELPER_5(vsha2cl64_vv, void, ptr, ptr, ptr, env, i32)
  
  DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32)

  DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32)
+
+DEF_HELPER_5(vghsh_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_4(vgmul_vv, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5ca83e8462..b10497afd3 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -957,3 +957,7 @@ vsha2cl_vv  10 1 . . 010 . 1110111 @r_vm_1
  # *** Zvksh vector crypto extension ***
  vsm3me_vv   10 1 . . 010 . 1110111 @r_vm_1
  vsm3c_vi101011 1 . . 010 . 1110111 @r_vm_1
+
+# *** Zvkg vector crypto extension ***
+vghsh_vv101100 1 . . 010 . 1110111 @r_vm_1
+vgmul_vv101000 1 . 10001 010 . 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc 
b/target/riscv/insn_trans/trans_rvvk.c.inc
index 6469dd2f02..af7cd62e7d 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -531,3 +531,33 @@ static inline bool vsm3c_check(DisasContext *s, arg_rmrr 
*a)
  
  GEN_VV_UNMASKED_TRANS(vsm3me_vv, vsm3me_check, ZVKSH_EGS)

  GEN_VI_UNMASKED_TRANS(vsm3c_vi, vsm3c_check, ZVKSH_EGS)
+
+/*
+ * Zvkg
+ */
+
+#define ZVKG_EGS 4
+
+static bool vgmul_check(DisasContext *s, arg_rmr *a)
+{
+int egw_bytes = ZVKG_EGS << s->sew;
+return s->cfg_ptr->ext_zvkg == true &&
+   vext_check_isa_ill(s) &&
+   require_rvv(s) &&
+   MAXSZ(s) >= egw_bytes &&
+   vext_check_ss(s, a->rd, a->rs2, a->vm) &&
+

[PATCH v3 1/2] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature

2023-07-08 Thread Hawkins Jiawei
This patch refactors vhost_vdpa_net_load_rx() to
restore the packet receive filtering state in relation to
VIRTIO_NET_F_CTRL_RX_EXTRA feature at device's startup.

Signed-off-by: Hawkins Jiawei 
---
v3:
  - return early if mismatch the condition suggested by Eugenio
  - remove the `on` variable suggested by Eugenio

v2: 
https://lore.kernel.org/all/66ec4d7e3a680de645043d0331ab65940154f2b8.1688365324.git.yin31...@gmail.com/
  - avoid sending CVQ command in default state suggested by Eugenio

v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg04957.html

 net/vhost-vdpa.c | 88 
 1 file changed, 88 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 0994836f8c..9a1905fddd 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -867,6 +867,94 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s,
 }
 }
 
+if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX_EXTRA)) {
+return 0;
+}
+
+/*
+ * According to virtio_net_reset(), device turns all-unicast mode
+ * off by default.
+ *
+ * Therefore, QEMU should only send this CVQ command if the driver
+ * sets all-unicast mode on, different from the device's defaults.
+ *
+ * Note that the device's defaults can mismatch the driver's
+ * configuration only at live migration.
+ */
+if (n->alluni) {
+dev_written = vhost_vdpa_net_load_rx_mode(s,
+VIRTIO_NET_CTRL_RX_ALLUNI, 1);
+if (dev_written < 0) {
+return dev_written;
+}
+if (*s->status != VIRTIO_NET_OK) {
+return -EIO;
+}
+}
+
+/*
+ * According to virtio_net_reset(), device turns non-multicast mode
+ * off by default.
+ *
+ * Therefore, QEMU should only send this CVQ command if the driver
+ * sets non-multicast mode on, different from the device's defaults.
+ *
+ * Note that the device's defaults can mismatch the driver's
+ * configuration only at live migration.
+ */
+if (n->nomulti) {
+dev_written = vhost_vdpa_net_load_rx_mode(s,
+VIRTIO_NET_CTRL_RX_NOMULTI, 1);
+if (dev_written < 0) {
+return dev_written;
+}
+if (*s->status != VIRTIO_NET_OK) {
+return -EIO;
+}
+}
+
+/*
+ * According to virtio_net_reset(), device turns non-unicast mode
+ * off by default.
+ *
+ * Therefore, QEMU should only send this CVQ command if the driver
+ * sets non-unicast mode on, different from the device's defaults.
+ *
+ * Note that the device's defaults can mismatch the driver's
+ * configuration only at live migration.
+ */
+if (n->nouni) {
+dev_written = vhost_vdpa_net_load_rx_mode(s,
+VIRTIO_NET_CTRL_RX_NOUNI, 1);
+if (dev_written < 0) {
+return dev_written;
+}
+if (*s->status != VIRTIO_NET_OK) {
+return -EIO;
+}
+}
+
+/*
+ * According to virtio_net_reset(), device turns non-broadcast mode
+ * off by default.
+ *
+ * Therefore, QEMU should only send this CVQ command if the driver
+ * sets non-broadcast mode on, different from the device's defaults.
+ *
+ * Note that the device's defaults can mismatch the driver's
+ * configuration only at live migration.
+ */
+if (n->nobcast) {
+dev_written = vhost_vdpa_net_load_rx_mode(s,
+VIRTIO_NET_CTRL_RX_NOBCAST, 1);
+if (dev_written < 0) {
+return dev_written;
+}
+if (*s->status != VIRTIO_NET_OK) {
+return -EIO;
+}
+}
+
 return 0;
 }
 
-- 
2.25.1




[PATCH v3 2/2] vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ

2023-07-08 Thread Hawkins Jiawei
Enable SVQ with VIRTIO_NET_F_CTRL_RX_EXTRA feature.

Signed-off-by: Hawkins Jiawei 
Acked-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 9a1905fddd..1df82636c9 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -105,6 +105,7 @@ static const uint64_t vdpa_svq_device_features =
 BIT_ULL(VIRTIO_NET_F_STATUS) |
 BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
 BIT_ULL(VIRTIO_NET_F_CTRL_RX) |
+BIT_ULL(VIRTIO_NET_F_CTRL_RX_EXTRA) |
 BIT_ULL(VIRTIO_NET_F_MQ) |
 BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
 BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
-- 
2.25.1




Re: [PULL 29/38] gdbstub: Permit reverse step/break to provide stop response

2023-07-08 Thread Alex Bennée


Michael Tokarev  writes:

> 03.07.2023 16:44, Alex Bennée wrote:
>> From: Nicholas Piggin 
>> The final part of the reverse step and break handling is to bring
>> the machine back to a debug stop state. gdb expects a response.
>> A gdb 'rsi' command hangs forever because the gdbstub filters out
>> the response (also observable with reverse_debugging.py avocado
>> tests).
>> Fix by setting allow_stop_reply for the gdb backward packets.
>> Fixes: 758370052fb ("gdbstub: only send stop-reply packets when
>> allowed to")
>> Cc: qemu-sta...@nongnu.org
>
> Hi!
>
> Are you guys sure this needs to be in -stable?
>
> To me it looks a sort of "partial revert" of a previous commit:
>
> commit 758370052fb602f9f23c3b8ae26a6133373c78e6
> Author: Matheus Tavares Bernardino 
> Date:   Thu May 4 12:37:31 2023 -0300
> Subject: gdbstub: only send stop-reply packets when allowed to
>
> which introduced `allow_stop_reply' field in GdbCmdParseEntry.
> This change ("gdbstub: Permit..") does not work in 8.0 without
> the above mentioned "gdbstub: only send" commit, and I guess
> it is *not* supposed to be in stable. Or is it?
>
> I'm not applying this one to stable for now.

Good catch - your right it's purely fixing something that has been
merged in the current cycle.

>
> Thanks,
>
> /mjt
>
>> Cc: Matheus Tavares Bernardino 
>> Cc: Alex Bennée 
>> Cc: Taylor Simpson 
>> Signed-off-by: Nicholas Piggin 
>> Acked-by: Matheus Tavares Bernardino 
>> Message-Id: <20230623035304.279833-1-npig...@gmail.com>
>> Signed-off-by: Alex Bennée 
>> Message-Id: <20230630180423.558337-30-alex.ben...@linaro.org>
>> diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
>> index be18568d0a..9496d7b175 100644
>> --- a/gdbstub/gdbstub.c
>> +++ b/gdbstub/gdbstub.c
>> @@ -1814,6 +1814,7 @@ static int gdb_handle_packet(const char *line_buf)
>>   .handler = handle_backward,
>>   .cmd = "b",
>>   .cmd_startswith = 1,
>> +.allow_stop_reply = true,
>>   .schema = "o0"
>>   };
>>   cmd_parser = &backward_cmd_desc;


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH v2 1/2] accel/tcg: Split out cpu_exec_longjmp_cleanup

2023-07-08 Thread Alex Bennée


Richard Henderson  writes:

> Share the setjmp cleanup between cpu_exec_step_atomic
> and cpu_exec_setjmp.
>
> Reviewed-by: Richard W.M. Jones 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



[PULL 2/3] linux-user: Fix accept4(SOCK_NONBLOCK) syscall

2023-07-08 Thread Helge Deller
The Linux accept4() syscall allows two flags only: SOCK_NONBLOCK and
SOCK_CLOEXEC, and returns -EINVAL if any other bits have been set.

Change the qemu implementation accordingly, which means we can not use
the fcntl_flags_tbl[] translation table which allows too many other
values.

Beside the correction in behaviour, this actually fixes the accept4()
emulation for hppa, mips and alpha targets for which SOCK_NONBLOCK is
different than TARGET_SOCK_NONBLOCK (aka O_NONBLOCK).

The fix can be verified with the testcase of the debian lwt package,
which hangs forever in a read() syscall without this patch.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
---
 linux-user/syscall.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 10f05b1e55..9b9e3bd5e3 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -3440,7 +3440,17 @@ static abi_long do_accept4(int fd, abi_ulong target_addr,
 abi_long ret;
 int host_flags;

-host_flags = target_to_host_bitmask(flags, fcntl_flags_tbl);
+if (flags & ~(TARGET_SOCK_CLOEXEC | TARGET_SOCK_NONBLOCK)) {
+return -TARGET_EINVAL;
+}
+
+host_flags = 0;
+if (flags & TARGET_SOCK_NONBLOCK) {
+host_flags |= SOCK_NONBLOCK;
+}
+if (flags & TARGET_SOCK_CLOEXEC) {
+host_flags |= SOCK_CLOEXEC;
+}

 if (target_addr == 0) {
 return get_errno(safe_accept4(fd, NULL, NULL, host_flags));
--
2.41.0




[PULL 0/3] Linux user fcntl64 patches

2023-07-08 Thread Helge Deller
The following changes since commit 97c81ef4b8e203d9620fd46e7eb77004563e3675:

  Merge tag 'pull-9p-20230706' of https://github.com/cschoenebeck/qemu into 
staging (2023-07-06 18:19:42 +0100)

are available in the Git repository at:

  https://github.com/hdeller/qemu-hppa.git tags/linux-user-fcntl64-pull-request

for you to fetch changes up to 036cf169a3484eeca5e17cfbee1f6988043ddd0e:

  linux-user: Improve strace output of pread64() and pwrite64() (2023-07-08 
16:55:08 +0200)


linux-user: Fix fcntl64() and accept4() for 32-bit targets

A set of 3 patches:
The first two patches fix fcntl64() and accept4().
the 3rd patch enhances the strace output for pread64/pwrite64().

This pull request does not includes Richard's mmap2 patch:
https://patchew.org/QEMU/20230630132159.376995-1-richard.hender...@linaro.org/20230630132159.376995-12-richard.hender...@linaro.org/

Changes:
v3:
- added r-b from Richard to patches #1 and #2
v2:
- rephrased commmit logs
- return O_LARGFILE for fcntl() syscall too
- dropped #ifdefs in accept4() patch
- Dropped my mmap2() patch (former patch #3)
- added r-b from Richard to 3rd patch

Helge



Helge Deller (3):
  linux-user: Fix fcntl() and fcntl64() to return O_LARGEFILE for 32-bit
targets
  linux-user: Fix accept4(SOCK_NONBLOCK) syscall
  linux-user: Improve strace output of pread64() and pwrite64()

 linux-user/strace.c| 19 +++
 linux-user/strace.list |  4 ++--
 linux-user/syscall.c   | 16 +++-
 3 files changed, 36 insertions(+), 3 deletions(-)

--
2.41.0




[PULL 1/3] linux-user: Fix fcntl() and fcntl64() to return O_LARGEFILE for 32-bit targets

2023-07-08 Thread Helge Deller
When running a 32-bit guest on a 64-bit host, fcntl[64](F_GETFL) should
return with the TARGET_O_LARGEFILE flag set, because all 64-bit hosts
support large files unconditionally.

But on 64-bit hosts, O_LARGEFILE has the value 0, so the flag
translation can't be done with the fcntl_flags_tbl[]. Instead add the
TARGET_O_LARGEFILE flag afterwards.

Note that for 64-bit guests the compiler will optimize away this code,
since TARGET_O_LARGEFILE is zero.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
---
 linux-user/syscall.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 08162cc966..10f05b1e55 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7132,6 +7132,10 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
 ret = get_errno(safe_fcntl(fd, host_cmd, arg));
 if (ret >= 0) {
 ret = host_to_target_bitmask(ret, fcntl_flags_tbl);
+/* tell 32-bit guests it uses largefile on 64-bit hosts: */
+if (O_LARGEFILE == 0 && HOST_LONG_BITS == 64) {
+ret |= TARGET_O_LARGEFILE;
+}
 }
 break;

--
2.41.0




[PULL 3/3] linux-user: Improve strace output of pread64() and pwrite64()

2023-07-08 Thread Helge Deller
Make the strace look nicer for those two syscalls.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
---
 linux-user/strace.c| 19 +++
 linux-user/strace.list |  4 ++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index aad2b62ca4..669200c4a4 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -3999,6 +3999,25 @@ print_tgkill(CPUArchState *cpu_env, const struct 
syscallname *name,
 }
 #endif

+#if defined(TARGET_NR_pread64) || defined(TARGET_NR_pwrite64)
+static void
+print_pread64(CPUArchState *cpu_env, const struct syscallname *name,
+abi_long arg0, abi_long arg1, abi_long arg2,
+abi_long arg3, abi_long arg4, abi_long arg5)
+{
+if (regpairs_aligned(cpu_env, TARGET_NR_pread64)) {
+arg3 = arg4;
+arg4 = arg5;
+}
+print_syscall_prologue(name);
+print_raw_param("%d", arg0, 0);
+print_pointer(arg1, 0);
+print_raw_param("%d", arg2, 0);
+print_raw_param("%" PRIu64, target_offset64(arg3, arg4), 1);
+print_syscall_epilogue(name);
+}
+#endif
+
 #ifdef TARGET_NR_statx
 static void
 print_statx(CPUArchState *cpu_env, const struct syscallname *name,
diff --git a/linux-user/strace.list b/linux-user/strace.list
index c7808ea118..6655d4f26d 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1068,7 +1068,7 @@
 { TARGET_NR_prctl, "prctl" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_pread64
-{ TARGET_NR_pread64, "pread64" , NULL, NULL, NULL },
+{ TARGET_NR_pread64, "pread64" , NULL, print_pread64, NULL },
 #endif
 #ifdef TARGET_NR_preadv
 { TARGET_NR_preadv, "preadv" , NULL, NULL, NULL },
@@ -1099,7 +1099,7 @@
 { TARGET_NR_putpmsg, "putpmsg" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_pwrite64
-{ TARGET_NR_pwrite64, "pwrite64" , NULL, NULL, NULL },
+{ TARGET_NR_pwrite64, "pwrite64" , NULL, print_pread64, NULL },
 #endif
 #ifdef TARGET_NR_pwritev
 { TARGET_NR_pwritev, "pwritev" , NULL, NULL, NULL },
--
2.41.0




Re: [PULL trival-patches 00/10] trivial-patches for 2023-07-08

2023-07-08 Thread Richard Henderson

On 7/8/23 06:12, Michael Tokarev wrote:

The following changes since commit 3b08e40b7abfe8be6020c4c27c93ad85590b9213:

   Merge tag 'for-upstream' ofhttps://gitlab.com/bonzini/qemu  into staging 
(2023-07-07 20:23:01 +0100)

are available in the Git repository at:

   https://gitlab.com/mjt0k/qemu.git  tags/trivial-patches-20230708

for you to fetch changes up to 13a637430be13bda3e6726752936321a1955bc93:

   hw/arm/virt-acpi-build.c: Add missing header (2023-07-08 07:24:38 +0300)


qemu trivial patches for 2023-07-08


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




Re: [PATCH v4 26/37] target/arm: Use aesdec_ISB_ISR_AK

2023-07-08 Thread Philippe Mathieu-Daudé

On 3/7/23 12:05, Richard Henderson wrote:

This implements the AESD instruction.

Signed-off-by: Richard Henderson 
---
  target/arm/tcg/crypto_helper.c | 37 +++---
  1 file changed, 16 insertions(+), 21 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 25/37] target/arm: Use aesenc_SB_SR_AK

2023-07-08 Thread Philippe Mathieu-Daudé

On 3/7/23 12:05, Richard Henderson wrote:

This implements the AESE instruction.

Signed-off-by: Richard Henderson 
---
  target/arm/tcg/crypto_helper.c | 24 +++-
  1 file changed, 23 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 27/37] target/arm: Use aesenc_MC

2023-07-08 Thread Philippe Mathieu-Daudé

On 3/7/23 12:05, Richard Henderson wrote:

This implements the AESMC instruction.

Signed-off-by: Richard Henderson 
---
  target/arm/tcg/crypto_helper.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 28/37] target/arm: Use aesdec_IMC

2023-07-08 Thread Philippe Mathieu-Daudé

On 3/7/23 12:05, Richard Henderson wrote:

This implements the AESIMC instruction.  We have converted everything
to crypto/aes-round.h; crypto/aes.h is no longer needed.

Signed-off-by: Richard Henderson 
---
  target/arm/tcg/crypto_helper.c | 33 ++---
  1 file changed, 14 insertions(+), 19 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 31/37] target/riscv: Use aesdec_IMC

2023-07-08 Thread Philippe Mathieu-Daudé

On 3/7/23 12:05, Richard Henderson wrote:

This implements the AES64IM instruction.

Signed-off-by: Richard Henderson 
---
  target/riscv/crypto_helper.c | 15 +--
  1 file changed, 5 insertions(+), 10 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] linux-user: make sure brk(0) returns a page-aligned value

2023-07-08 Thread Richard Henderson

On 7/6/23 12:34, Andreas Schwab wrote:

Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Signed-off-by: Andreas Schwab 
---
  linux-user/syscall.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 08162cc966..e8a17377f5 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -805,7 +805,7 @@ static abi_ulong brk_page;
  
  void target_set_brk(abi_ulong new_brk)

  {
-target_brk = new_brk;
+target_brk = TARGET_PAGE_ALIGN(new_brk);
  brk_page = HOST_PAGE_ALIGN(target_brk);
  }
  


It make sense, since that's how do_brk aligns things.
I'm curious why this error might have produced host memory clobbering, but I'm not going 
to debug that.


Queuing for tcg/linux-user.


r~



Re: [PATCH v4 33/37] target/riscv: Use aesdec_ISB_ISR_IMC_AK

2023-07-08 Thread Philippe Mathieu-Daudé

On 3/7/23 12:05, Richard Henderson wrote:

This implements the AES64DSM instruction.  This was the last use
of aes64_operation and its support macros, so remove them all.

Signed-off-by: Richard Henderson 
---
  target/riscv/crypto_helper.c | 101 ---
  1 file changed, 10 insertions(+), 91 deletions(-)




  target_ulong HELPER(aes64esm)(target_ulong rs1, target_ulong rs2)
  {
  AESState t;
@@ -228,7 +138,16 @@ target_ulong HELPER(aes64ds)(target_ulong rs1, 
target_ulong rs2)
  
  target_ulong HELPER(aes64dsm)(target_ulong rs1, target_ulong rs2)

  {
-return aes64_operation(rs1, rs2, false, true);
+AESState t, z = { };


z can be const, otherwise:

Reviewed-by: Philippe Mathieu-Daudé 


+
+/*
+ * This instruction does not include a round key,
+ * so supply a zero to our primitive.
+ */
+t.d[HOST_BIG_ENDIAN] = rs1;
+t.d[!HOST_BIG_ENDIAN] = rs2;
+aesdec_ISB_ISR_IMC_AK(&t, &t, &z, false);
+return t.d[HOST_BIG_ENDIAN];
  }
  
  target_ulong HELPER(aes64ks2)(target_ulong rs1, target_ulong rs2)





Re: [PATCH v4 13/37] host/include/aarch64: Implement aes-round.h

2023-07-08 Thread Philippe Mathieu-Daudé

+Ard

On 3/7/23 12:04, Richard Henderson wrote:

Detect AES in cpuinfo; implement the accel hooks.

Signed-off-by: Richard Henderson 
---
  meson.build  |   9 +
  host/include/aarch64/host/cpuinfo.h  |   1 +
  host/include/aarch64/host/crypto/aes-round.h | 205 +++
  util/cpuinfo-aarch64.c   |   2 +
  4 files changed, 217 insertions(+)
  create mode 100644 host/include/aarch64/host/crypto/aes-round.h

diff --git a/meson.build b/meson.build
index a9ba0bfab3..029c6c0048 100644
--- a/meson.build
+++ b/meson.build
@@ -2674,6 +2674,15 @@ config_host_data.set('CONFIG_AVX512BW_OPT', 
get_option('avx512bw') \
  int main(int argc, char *argv[]) { return bar(argv[0]); }
'''), error_message: 'AVX512BW not available').allowed())
  
+# For both AArch64 and AArch32, detect if builtins are available.

+config_host_data.set('CONFIG_ARM_AES_BUILTIN', cc.compiles('''
+#include 
+#ifndef __ARM_FEATURE_AES
+__attribute__((target("+crypto")))
+#endif
+void foo(uint8x16_t *p) { *p = vaesmcq_u8(*p); }
+  '''))
+
  have_pvrdma = get_option('pvrdma') \
.require(rdma.found(), error_message: 'PVRDMA requires OpenFabrics 
libraries') \
.require(cc.compiles(gnu_source_prefix + '''
diff --git a/host/include/aarch64/host/cpuinfo.h 
b/host/include/aarch64/host/cpuinfo.h
index 82227890b4..05feeb4f43 100644
--- a/host/include/aarch64/host/cpuinfo.h
+++ b/host/include/aarch64/host/cpuinfo.h
@@ -9,6 +9,7 @@
  #define CPUINFO_ALWAYS  (1u << 0)  /* so cpuinfo is nonzero */
  #define CPUINFO_LSE (1u << 1)
  #define CPUINFO_LSE2(1u << 2)
+#define CPUINFO_AES (1u << 3)
  
  /* Initialized with a constructor. */

  extern unsigned cpuinfo;
diff --git a/host/include/aarch64/host/crypto/aes-round.h 
b/host/include/aarch64/host/crypto/aes-round.h
new file mode 100644
index 00..8b5f88d50c
--- /dev/null
+++ b/host/include/aarch64/host/crypto/aes-round.h
@@ -0,0 +1,205 @@
+/*
+ * AArch64 specific aes acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef AARCH64_HOST_CRYPTO_AES_ROUND_H
+#define AARCH64_HOST_CRYPTO_AES_ROUND_H
+
+#include "host/cpuinfo.h"
+#include 
+
+#ifdef __ARM_FEATURE_AES
+# define HAVE_AES_ACCEL  true
+#else
+# define HAVE_AES_ACCEL  likely(cpuinfo & CPUINFO_AES)
+#endif
+#if !defined(__ARM_FEATURE_AES) && defined(CONFIG_ARM_AES_BUILTIN)
+# define ATTR_AES_ACCEL  __attribute__((target("+crypto")))
+#else
+# define ATTR_AES_ACCEL
+#endif
+
+static inline uint8x16_t aes_accel_bswap(uint8x16_t x)
+{
+return vqtbl1q_u8(x, (uint8x16_t){ 15, 14, 13, 12, 11, 10, 9, 8,
+7,  6,  5,  4,  3,  2, 1, 0, });
+}
+
+#ifdef CONFIG_ARM_AES_BUILTIN
+# define aes_accel_aesdvaesdq_u8
+# define aes_accel_aesevaeseq_u8
+# define aes_accel_aesmc   vaesmcq_u8
+# define aes_accel_aesimc  vaesimcq_u8
+# define aes_accel_aesd_imc(S, K)  vaesimcq_u8(vaesdq_u8(S, K))
+# define aes_accel_aese_mc(S, K)   vaesmcq_u8(vaeseq_u8(S, K))
+#else
+static inline uint8x16_t aes_accel_aesd(uint8x16_t d, uint8x16_t k)
+{
+asm(".arch_extension aes\n\t"
+"aesd %0.16b, %1.16b" : "+w"(d) : "w"(k));
+return d;
+}
+
+static inline uint8x16_t aes_accel_aese(uint8x16_t d, uint8x16_t k)
+{
+asm(".arch_extension aes\n\t"
+"aese %0.16b, %1.16b" : "+w"(d) : "w"(k));
+return d;
+}
+
+static inline uint8x16_t aes_accel_aesmc(uint8x16_t d)
+{
+asm(".arch_extension aes\n\t"
+"aesmc %0.16b, %1.16b" : "=w"(d) : "w"(d));
+return d;
+}
+
+static inline uint8x16_t aes_accel_aesimc(uint8x16_t d)
+{
+asm(".arch_extension aes\n\t"
+"aesimc %0.16b, %1.16b" : "=w"(d) : "w"(d));
+return d;
+}
+
+/* Most CPUs fuse AESD+AESIMC in the execution pipeline. */
+static inline uint8x16_t aes_accel_aesd_imc(uint8x16_t d, uint8x16_t k)
+{
+asm(".arch_extension aes\n\t"
+"aesd %0.16b, %1.16b\n\t"
+"aesimc %0.16b, %0.16b" : "+w"(d) : "w"(k));
+return d;
+}
+
+/* Most CPUs fuse AESE+AESMC in the execution pipeline. */
+static inline uint8x16_t aes_accel_aese_mc(uint8x16_t d, uint8x16_t k)
+{
+asm(".arch_extension aes\n\t"
+"aese %0.16b, %1.16b\n\t"
+"aesmc %0.16b, %0.16b" : "+w"(d) : "w"(k));
+return d;
+}
+#endif /* CONFIG_ARM_AES_BUILTIN */
+
+static inline void ATTR_AES_ACCEL
+aesenc_MC_accel(AESState *ret, const AESState *st, bool be)
+{
+uint8x16_t t = (uint8x16_t)st->v;
+
+if (be) {
+t = aes_accel_bswap(t);
+t = aes_accel_aesmc(t);
+t = aes_accel_bswap(t);
+} else {
+t = aes_accel_aesmc(t);
+}
+ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_SB_SR_AK_accel(AESState *ret, const AESState *st,
+  const AESState *rk, bool be)
+{
+uint8x16_t t = (uint8x16_t)st->v;
+uint8x16_t z = { };
+
+if (be) {
+t = aes_accel_bswap(t);
+ 

Re: [PATCH v4 00/37] crypto: Provide aes-round.h and host accel

2023-07-08 Thread Philippe Mathieu-Daudé

On 3/7/23 12:04, Richard Henderson wrote:

Inspired by Ard Biesheuvel's RFC patches for accelerating AES
under emulation, provide a set of primitives that maps between
the guest and host fragments.

Changes for v4:
   * Fix typo in AESState (Max Chou)
   * Define AES_SH/ISH as macros (Ard Biesheuvel)
   * Group patches by subsystem.

Patches lacking review:
   12-host-include-i386-Implement-aes-round.h.patch


Deferring this one to Paolo & co,


   13-host-include-aarch64-Implement-aes-round.h.patch


and this one to Ard :)


Possible cleanup to add in patch #4 "crypto/aes: Add AES_SH,
AES_ISH macros", declare 'extern const AESState aes_zero;' in
include/crypto/aes-round.h and define it in crypto/aes.c.

Regards,

Phil.



Re: [PATCH v2 21/24] accel/tcg: Accept more page flags in page_check_range

2023-07-08 Thread Philippe Mathieu-Daudé

On 7/7/23 22:40, Richard Henderson wrote:

Only PAGE_WRITE needs special attention, all others can be
handled as we do for PAGE_READ.  Adjust the mask.

Signed-off-by: Richard Henderson 
---
  accel/tcg/user-exec.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 17/24] linux-user: Use 'last' instead of 'end' in target_mmap

2023-07-08 Thread Philippe Mathieu-Daudé

On 7/7/23 22:40, Richard Henderson wrote:

Complete the transition within the mmap functions to a formulation
that does not overflow at the end of the address space.

Signed-off-by: Richard Henderson 
---
  linux-user/mmap.c | 45 +++--
  1 file changed, 23 insertions(+), 22 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] linux-user: make sure brk(0) returns a page-aligned value

2023-07-08 Thread Helge Deller

On 7/8/23 19:26, Richard Henderson wrote:

On 7/6/23 12:34, Andreas Schwab wrote:

Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Signed-off-by: Andreas Schwab 
---
  linux-user/syscall.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 08162cc966..e8a17377f5 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -805,7 +805,7 @@ static abi_ulong brk_page;
  void target_set_brk(abi_ulong new_brk)
  {
-    target_brk = new_brk;
+    target_brk = TARGET_PAGE_ALIGN(new_brk);
  brk_page = HOST_PAGE_ALIGN(target_brk);
  }


It make sense, since that's how do_brk aligns things.


Yes, patch looks good.
I haven't tested, but it seems it adjusts the initial brk(0) value
only to make sure that it's target page aligned.
Maybe the title should be: ?
linux-user: make sure the initial brk(0) is page-aligned


I'm curious why this error might have produced host memory clobbering, but I'm 
not going to debug that.


I don't believe that this un-alignment triggers host memory clobbering either.

Helge



Re: [PATCH] linux-user: make sure brk(0) returns a page-aligned value

2023-07-08 Thread Helge Deller

On 7/8/23 23:36, Helge Deller wrote:

On 7/8/23 19:26, Richard Henderson wrote:

On 7/6/23 12:34, Andreas Schwab wrote:

Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Signed-off-by: Andreas Schwab 
---
  linux-user/syscall.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 08162cc966..e8a17377f5 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -805,7 +805,7 @@ static abi_ulong brk_page;
  void target_set_brk(abi_ulong new_brk)
  {
-    target_brk = new_brk;
+    target_brk = TARGET_PAGE_ALIGN(new_brk);
  brk_page = HOST_PAGE_ALIGN(target_brk);
  }

...

I'm curious why this error might have produced host memory clobbering, but I'm 
not going to debug that.

I don't believe that this un-alignment triggers host memory clobbering either.


See my follow-up in the other mail threadd:
"Re: [RISC-V] ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: 
(cpu == current_cpu)"

Helge



Re: [RISC-V] ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: (cpu == current_cpu)

2023-07-08 Thread Helge Deller

On 7/4/23 12:52, Andreas Schwab wrote:

I think the issue is that the value returned from brk(0) is no longer
page aligned.



$ ./qemu-riscv64 -strace ../exe1
18329 brk(NULL) = 0x00303000
18329 faccessat(AT_FDCWD,"/etc/ld.so.preload",R_OK,0x3010d0) = -1 errno=2 (No 
such file or directory)
18329 openat(AT_FDCWD,"/etc/ld.so.cache",O_RDONLY|O_CLOEXEC) = 3
18329 newfstatat(3,"",0x0040007fe900,0x1000) = 0
18329 mmap(NULL,8799,PROT_READ,MAP_PRIVATE,3,0) = 0x004000824000
18329 close(3) = 0
18329 openat(AT_FDCWD,"/lib64/lp64d/libc.so.6",O_RDONLY|O_CLOEXEC) = 3
18329 read(3,0x7fea70,832) = 832
18329 newfstatat(3,"",0x0040007fe8f0,0x1000) = 0
18329 mmap(NULL,1405128,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,3,0) = 
0x004000827000
18329 
mmap(0x00400096d000,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x146000)
 = 0x00400096d000
18329 
mmap(0x004000972000,49352,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,-1,0)
 = 0x004000972000
18329 close(3) = 0
18329 mmap(NULL,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 
0x00400097f000
18329 set_tid_address(0x400097f710) = 18329
18329 set_robust_list(0x400097f720,24) = -1 errno=38 (Function not implemented)
18329 mprotect(0x00400096d000,12288,PROT_READ) = 0
18329 mprotect(0x00400082,4096,PROT_READ) = 0
18329 prlimit64(0,RLIMIT_STACK,NULL,0x0040007ff4f8) = 0 
({rlim_cur=8388608,rlim_max=-1})
18329 munmap(0x004000824000,8799) = 0
18329 newfstatat(1,"",0x0040007ff658,0x1000) = 0
18329 getrandom(0x4000976a40,8,1) = 8
18329 brk(NULL) = 0x00303000
18329 brk(0x00324000) = 0x00324000
18329 write(1,0x3032a0,12)Hello world
  = 12
18329 exit_group(0)



$ qemu-riscv64 -strace ../exe1
18369 brk(NULL) = 0x003022e8
18369 faccessat(AT_FDCWD,"/etc/ld.so.preload",R_OK,0x3010d0) = -1 errno=2 (No 
such file or directory)
18369 openat(AT_FDCWD,"/etc/ld.so.cache",O_RDONLY|O_CLOEXEC) = 3
18369 newfstatat(3,"",0x0040007fe8f0,0x1000) = 0
18369 mmap(NULL,8799,PROT_READ,MAP_PRIVATE,3,0) = 0x004000824000
18369 close(3) = 0
18369 openat(AT_FDCWD,"/lib64/lp64d/libc.so.6",O_RDONLY|O_CLOEXEC) = 3
18369 read(3,0x7fea60,832) = 832
18369 newfstatat(3,"",0x0040007fe8e0,0x1000) = 0
18369 mmap(NULL,1405128,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,3,0) = 
0x004000827000
18369 
mmap(0x00400096d000,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x146000)
 = 0x00400096d000
18369 
mmap(0x004000972000,49352,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,-1,0)
 = 0x004000972000
18369 close(3) = 0
18369 mmap(NULL,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 
0x00400097f000
18369 set_tid_address(0x400097f710) = 18369
18369 set_robust_list(0x400097f720,24) = -1 errno=38 (Function not implemented)
18369 mprotect(0x00400096d000,12288,PROT_READ) = 0
18369 mprotect(0x00400082,4096,PROT_READ) = 0
18369 prlimit64(0,RLIMIT_STACK,NULL,0x0040007ff4e8) = 0 
({rlim_cur=8388608,rlim_max=-1})
18369 munmap(0x004000824000,8799) = 0
18369 newfstatat(1,"",0x0040007ff648,0x1000) = 0
18369 getrandom(0x4000976a40,8,1) = 8
18369 brk(NULL) = 0x003022e8
18369 brk(0x003232e8)**
ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: (cpu == 
current_cpu)
Bail out! ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: 
(cpu == current_cpu)
**
ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: (cpu == 
current_cpu)
Bail out! ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: 
(cpu == current_cpu)


This reminds me on a failure I once saw on the hppa target.
See commit bd4b7fd6ba98 ("linux-user/hppa: Fix segfaults on page zero").

Maybe the not-page-aligned brk address triggers the glibc or application in the
guest  to jump somewhere else (see cpu_exec_setjmp)?
The example in my commit message jumped to address 0, which isn't writeable
for applications in the target machine and qemu was missing to trigger/handle
the correct target exception handling.

I think your patch to page-align the initial brk() is correct, but it probably
just hides the real problem.

Maybe you are able to test what happens with exe1 on a physical risc-v machine
if the brk-adress wouldn't be page aligned?
Maybe you are missing some exception handling for risc-v in qemu too?

Helge