Re: [PATCH v2 21/46] target/loongarch: Implement xvsigncov
On 6/30/23 08:58, Song Gao wrote: This patch includes: - XVSIGNCOV.{B/H/W/D}. Signed-off-by: Song Gao --- target/loongarch/disas.c | 5 + target/loongarch/insn_trans/trans_lasx.c.inc | 5 + target/loongarch/insns.decode| 5 + target/loongarch/vec.h | 2 ++ target/loongarch/vec_helper.c| 2 -- 5 files changed, 17 insertions(+), 2 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v2 22/46] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz
On 6/30/23 08:58, Song Gao wrote: -void HELPER(vmskltz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj) +void HELPER(vmskltz_b)(CPULoongArchState *env, + uint32_t oprsz, uint32_t vd, uint32_t vj) { -uint16_t temp = 0; +int i, max; +uint16_t temp; VReg *Vd = &(env->fpr[vd].vreg); VReg *Vj = &(env->fpr[vj].vreg); -temp = do_vmskltz_b(Vj->D(0)); -temp |= (do_vmskltz_b(Vj->D(1)) << 8); -Vd->D(0) = temp; -Vd->D(1) = 0; +max = (oprsz == 16) ? 1 : 2; + +for (i = 0; i < max; i++) { +temp = 0; +temp = do_vmskltz_b(Vj->D(2 * i)); void * and desc operands; loop over oprsz. r~
Re: [PATCH] chore: rename `tricore_feature` to `is_tricore_feature_enabled`
Hi Rui, On Thu, Jul 06, 2023 at 12:59:55PM -0400, Rui Chen wrote: > While upgrading capstone to v5, there was some name clash with the > tricore_feature in capstone (which was introduced in this PR), thus rename > tricore_feature to is_tricore_feature_enabled. > > Build error log is below > > /opt/homebrew/Cellar/capstone/5.0/include/capstone/tricore.h:561:3: error: > redefinition of 'tricore_feature' as different kind of symbol > } tricore_feature; > ^ > ../target/tricore/cpu.h:261:19: note: previous definition is here > static inline int tricore_feature(CPUTriCoreState *env, int feature) > ^ > 1 error generated. I ran into the same problem when trying out capstone. I think a better name would be tricore_has_feature() to match has_feature() in target/tricore/translate.c. P.S. if you CC me it helps my mail filter to find your patch :). Also we have the rule for qemu-devel to not send a patch as a attachment. See (https://www.qemu.org/docs/master/devel/submitting-a-patch.html) Cheers, Bastian
Re: [PATCH v2 23/46] target/loognarch: Implement xvldi
On 6/30/23 08:58, Song Gao wrote: This patch includes: - XVLDI. Signed-off-by: Song Gao --- target/loongarch/disas.c | 7 +++ target/loongarch/insn_trans/trans_lasx.c.inc | 2 ++ target/loongarch/insn_trans/trans_lsx.c.inc | 6 -- target/loongarch/insns.decode| 2 ++ 4 files changed, 15 insertions(+), 2 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v2 24/46] target/loongarch: Implement LASX logic instructions
On 6/30/23 08:58, Song Gao wrote: +len = (simd_oprsz(v) == 16) ? LSX_LEN : LASX_LEN; Use simd_oprsz directly, without the rest of the computation. r~
Re: [PATCH v2 25/46] target/loongarch: Implement xvsll xvsrl xvsra xvrotr
On 6/30/23 08:58, Song Gao wrote: This patch includes: - XVSLL[I].{B/H/W/D}; - XVSRL[I].{B/H/W/D}; - XVSRA[I].{B/H/W/D}; - XVROTR[I].{B/H/W/D}. Signed-off-by: Song Gao --- target/loongarch/disas.c | 36 target/loongarch/insn_trans/trans_lasx.c.inc | 36 target/loongarch/insns.decode| 33 ++ 3 files changed, 105 insertions(+) Reviewed-by: Richard Henderson r~
Re: [PATCH v2 26/46] target/loongarch: Implement xvsllwil xvextl
On 6/30/23 08:58, Song Gao wrote: +#define VSLLWIL(NAME, BIT, E1, E2) \ +void HELPER(NAME)(CPULoongArchState *env, uint32_t oprsz, \ + uint32_t vd, uint32_t vj, uint32_t imm) \ +{ \ +int i, max;\ +VReg temp; \ +VReg *Vd = &(env->fpr[vd].vreg); \ +VReg *Vj = &(env->fpr[vj].vreg); \ +typedef __typeof(temp.E1(0)) TD; \ + \ +temp.Q(0) = int128_zero(); \ + \ +if (oprsz == 32) { \ +temp.Q(1) = int128_zero(); \ +} \ + \ +max = LSX_LEN / BIT; \ +for (i = 0; i < max; i++) {\ +temp.E1(i) = (TD)Vj->E2(i) << (imm % BIT); \ +if (oprsz == 32) { \ +temp.E1(i + max) = (TD)Vj->E2(i + max * 2) << (imm % BIT); \ +} \ +} \ +*Vd = temp;\ +} Function parameters using void* and desc. VReg temp = { }; instead of conditional partial assignment. Fix iteration, as previously discussed. r~
Re: [PATCH v2 20/46] target/loongarch: Implement vext2xv
Hi, Richard 在 2023/7/8 上午5:19, Richard Henderson 写道: On 6/30/23 08:58, Song Gao wrote: +#define VEXT2XV(NAME, BIT, E1, E2) \ +void HELPER(NAME)(CPULoongArchState *env, uint32_t oprsz, \ + uint32_t vd, uint32_t vj) \ +{ \ + int i; \ + VReg *Vd = &(env->fpr[vd].vreg); \ + VReg *Vj = &(env->fpr[vj].vreg); \ + VReg temp; \ + \ + for (i = 0; i < LASX_LEN / BIT; i++) { \ + temp.E1(i) = Vj->E2(i); \ + } \ + *Vd = temp; \ +} So unlike VEXT(H), this does compress in order? Yes. Anyway, function signature and iteration without LASX_LEN. Isn't there a 128-bit helper to merge this with? There is no similar 128 bit instructions. Thanks. Song Gao
[PATCH v3 2/2] target/riscv: Optimize ambiguous local variable in pmp_hart_has_privs
These two values represents whether start/end address is in pmp_range. However, the type and name of them is ambiguous. This commit change the name and type of them to improve code readability and accuracy. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1735 Reviewed-by: Weiwei Li Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Ruibo Lu --- target/riscv/pmp.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c index 1a9279ba88..ea3d29217a 100644 --- a/target/riscv/pmp.c +++ b/target/riscv/pmp.c @@ -203,16 +203,16 @@ void pmp_update_rule_nums(CPURISCVState *env) } } -static int pmp_is_in_range(CPURISCVState *env, int pmp_index, - target_ulong addr) +static bool pmp_is_in_range(CPURISCVState *env, int pmp_index, +target_ulong addr) { -int result = 0; +bool result = false; if ((addr >= env->pmp_state.addr[pmp_index].sa) && (addr <= env->pmp_state.addr[pmp_index].ea)) { -result = 1; +result = true; } else { -result = 0; +result = false; } return result; @@ -287,8 +287,8 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong addr, { int i = 0; int pmp_size = 0; -target_ulong s = 0; -target_ulong e = 0; +bool sa_in = false; +bool ea_in = false; /* Short cut if no rules */ if (0 == pmp_get_num_rules(env)) { @@ -314,11 +314,11 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong addr, * from low to high */ for (i = 0; i < MAX_RISCV_PMPS; i++) { -s = pmp_is_in_range(env, i, addr); -e = pmp_is_in_range(env, i, addr + pmp_size - 1); +sa_in = pmp_is_in_range(env, i, addr); +ea_in = pmp_is_in_range(env, i, addr + pmp_size - 1); /* partially inside */ -if ((s + e) == 1) { +if (sa_in ^ ea_in) { qemu_log_mask(LOG_GUEST_ERROR, "pmp violation - access is partially inside\n"); *allowed_privs = 0; @@ -339,7 +339,7 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong addr, (env->pmp_state.pmp[i].cfg_reg & PMP_WRITE) | ((env->pmp_state.pmp[i].cfg_reg & PMP_EXEC) >> 2); -if (((s + e) == 2) && (PMP_AMATCH_OFF != a_field)) { +if (sa_in && ea_in && (PMP_AMATCH_OFF != a_field)) { /* * If the PMP entry is not off and the address is in range, * do the priv check -- 2.41.0
[PATCH v3 0/2] target/riscv: improve code accuracy and
I'm so sorry. As a newcomer, I'm not familiar with the patch mechanism. I mistakenly added the reviewer's "Reviewed-by" line into the wrong commit, So I have resent this patchset Changes in v3: * fix the allignment of pmp_is_in_range parameter line Changes in v2: * change the initial values of sa_in and ea_in to false * change the condition expression when address area fully in range Ruibo Lu (2): target/riscv: Remove redundant check in pmp_is_locked target/riscv: Optimize ambiguous local variable in pmp_hart_has_privs target/riscv/pmp.c | 27 +++ 1 file changed, 11 insertions(+), 16 deletions(-) -- 2.41.0
[PATCH v3 1/2] target/riscv: Remove redundant check in pmp_is_locked
the check of top PMP is redundant and will not influence the return value, so consider remove it Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1733 Reviewed-by: Weiwei Li Reviewed-by: Alistair Francis Signed-off-by: Ruibo Lu --- target/riscv/pmp.c | 5 - 1 file changed, 5 deletions(-) diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c index 9d8db493e6..1a9279ba88 100644 --- a/target/riscv/pmp.c +++ b/target/riscv/pmp.c @@ -49,11 +49,6 @@ static inline int pmp_is_locked(CPURISCVState *env, uint32_t pmp_index) return 1; } -/* Top PMP has no 'next' to check */ -if ((pmp_index + 1u) >= MAX_RISCV_PMPS) { -return 0; -} - return 0; } -- 2.41.0
Re: Reducing vdpa migration downtime because of memory pin / maps
On 7/5/2023 10:46 PM, Eugenio Perez Martin wrote: On Thu, Jul 6, 2023 at 2:13 AM Si-Wei Liu wrote: On 7/5/2023 11:03 AM, Eugenio Perez Martin wrote: On Tue, Jun 27, 2023 at 8:36 AM Si-Wei Liu wrote: On 6/9/2023 7:32 AM, Eugenio Perez Martin wrote: On Fri, Jun 9, 2023 at 12:39 AM Si-Wei Liu wrote: On 6/7/23 01:08, Eugenio Perez Martin wrote: On Wed, Jun 7, 2023 at 12:43 AM Si-Wei Liu wrote: Sorry for reviving this old thread, I lost the best timing to follow up on this while I was on vacation. I have been working on this and found out some discrepancy, please see below. On 4/5/23 04:37, Eugenio Perez Martin wrote: Hi! As mentioned in the last upstream virtio-networking meeting, one of the factors that adds more downtime to migration is the handling of the guest memory (pin, map, etc). At this moment this handling is bound to the virtio life cycle (DRIVER_OK, RESET). In that sense, the destination device waits until all the guest memory / state is migrated to start pinning all the memory. The proposal is to bind it to the char device life cycle (open vs close), Hmmm, really? If it's the life cycle for char device, the next guest / qemu launch on the same vhost-vdpa device node won't make it work. Maybe my sentence was not accurate, but I think we're on the same page here. Two qemu instances opening the same char device at the same time are not allowed, and vhost_vdpa_release clean all the maps. So the next qemu that opens the char device should see a clean device anyway. I mean the pin can't be done at the time of char device open, where the user address space is not known/bound yet. The earliest point possible for pinning would be until the vhost_attach_mm() call from SET_OWNER is done. Maybe we are deviating, let me start again. Using QEMU code, what I'm proposing is to modify the lifecycle of the .listener member of struct vhost_vdpa. At this moment, the memory listener is registered at vhost_vdpa_dev_start(dev, started=true) call for the last vhost_dev, and is unregistered in both vhost_vdpa_reset_status and vhost_vdpa_cleanup. My original proposal was just to move the memory listener registration to the last vhost_vdpa_init, and remove the unregister from vhost_vdpa_reset_status. The calls to vhost_vdpa_dma_map/unmap would be the same, the device should not realize this change. This can address LM downtime latency for sure, but it won't help downtime during dynamic SVQ switch - which still needs to go through the full unmap/map cycle (that includes the slow part for pinning) from passthrough to SVQ mode. Be noted not every device could work with a separate ASID for SVQ descriptors. The fix should expect to work on normal vDPA vendor devices without a separate descriptor ASID, with platform IOMMU underneath or with on-chip IOMMU. At this moment the SVQ switch is very inefficient mapping-wise, as it unmap all the GPA->HVA maps and overrides it. In particular, SVQ is allocated in low regions of the iova space, and then the guest memory is allocated in this new IOVA region incrementally. Yep. The key to build this fast path for SVQ switching I think is to maintain the identity mapping for the passthrough queues so that QEMU can reuse the old mappings for guest memory (e.g. GIOVA identity mapped to GPA) while incrementally adding new mappings for SVQ vrings. We can optimize that if we place SVQ in a free GPA area instead. Here's a question though: it might not be hard to find a free GPA range for the non-vIOMMU case (allocate iova from beyond the 48bit or 52bit ranges), but I'm not sure if easy to find a free GIOVA range for the vIOMMU case - particularly this has to work in the same entire 64bit IOVA address ranges that (for now) QEMU won't be able to "reserve" a specific IOVA ranges for SVQ from the vIOMMU. Do you foresee this can be done for every QEMU emulated vIOMMU (intel-iommu amd-iommu, arm smmu and virito-iommu) so that we can call it out as a generic means for SVQ switching optimization? In the case vIOMMU allocates a new block we will use the same algorithm as now: * Find a new free IOVA chunk of the same size * Map this new SVQ IOVA, that may or may not be the same as SVQ Since we must go through the translation phase to sanitize guest's available descriptors anyway, it has zero added cost. Not sure I followed, this can work but doesn't seem able to reuse the old host kernel mappings for guest memory, hence still requires remap of the entire host IOVA ranges when SVQ IOVA comes along. I think by maintaining 1:1 identity map on guest memory, we don't have to bother tearing down existing HVA->HPA mappings in kernel thus save the expensive pinning calls at large. I don't clearly see under this scheme, how the new SVQ IOVA may work with potential conflict on IOVA space from hotplugged memory - in this case the 1:1 IOVA->GPA identity guest memory mapping can't be kept. Another option would be to move the SVQ vring to a new region, but I don't see an
[PATCH v3 0/2] Vhost-vdpa Shadow Virtqueue _F_CTRL_RX_EXTRA commands support
This series enables shadowed CVQ to intercept rx commands related to VIRTIO_NET_F_CTRL_RX_EXTRA feature through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that rx state in the destination. To test this patch series, one should modify the `n->parent_obj.guest_features` value in vhost_vdpa_net_load_rx() using gdb, as the linux virtio-net driver does not currently support the VIRTIO_NET_F_CTRL_RX_EXTRA feature. Note that this patch should be based on [1] patch "Vhost-vdpa Shadow Virtqueue _F_CTRL_RX commands support" [1]. https://lore.kernel.org/all/cover.1688743107.git.yin31...@gmail.com/ TestStep 1. test the patch series using vp-vdpa device - For L0 guest, boot QEMU with virtio-net-pci net device with `ctrl_vq`, `ctrl_rx` and `ctrl_rx_extra` feature on, something like: -device virtio-net-pci,rx_queue_size=256,tx_queue_size=256, iommu_platform=on,ctrl_vq=on,ctrl_rx=on,ctrl_rx_extra=on... - For L1 guest, apply the patch series and compile the code, start QEMU with vdpa device with svq mode and enable the `ctrl_vq`, `ctrl_rx` and `ctrl_rx_extra` feature on, something like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,ctrl_vq=on,ctrl_rx=on,ctrl_rx_extra=on... Use gdb to attach the VM and break at the net/vhost-vdpa.c:870. With this series, gdb can hit the breakpoint. Enable the VIRTIO_NET_F_CTRL_RX_EXTRA feature and enable the non-unicast mode by entering the following gdb commands: ```gdb set n->parent_obj.guest_features |= (1 << 20) set n->nouni = 1 c ``` QEMU should not trigger any errors or warnings. Without this series, QEMU should fail with "x-svq=true: vdpa svq does not work with features 0x10". ChangeLog = v3: - return early if mismatch the condition suggested by Eugenio in patch 1 "vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature" - remove the `on` variable suggested by Eugenio in patch 1 "vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature" v2: https://lore.kernel.org/all/cover.1688365324.git.yin31...@gmail.com/ - avoid sending CVQ command in default state suggested by Eugenio v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg04956.html Hawkins Jiawei (2): vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ net/vhost-vdpa.c | 89 1 file changed, 89 insertions(+) -- 2.25.1
Re: [PATCH v7 12/15] target/riscv: Add Zvkg ISA extension support
Hi, This patch breaks some gitlab runners because of this: On 7/2/23 12:53, Max Chou wrote: From: Nazar Kazakov This commit adds support for the Zvkg vector-crypto extension, which consists of the following instructions: * vgmul.vv * vghsh.vv Translation functions are defined in `target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in `target/riscv/vcrypto_helper.c`. Co-authored-by: Lawrence Hunter [max.c...@sifive.com: Replaced vstart checking by TCG op] Signed-off-by: Lawrence Hunter Signed-off-by: Nazar Kazakov Signed-off-by: Max Chou Reviewed-by: Daniel Henrique Barboza [max.c...@sifive.com: Exposed x-zvkg property] --- target/riscv/cpu.c | 6 +- target/riscv/cpu_cfg.h | 1 + target/riscv/helper.h| 3 + target/riscv/insn32.decode | 4 ++ target/riscv/insn_trans/trans_rvvk.c.inc | 30 ++ target/riscv/vcrypto_helper.c| 72 6 files changed, 114 insertions(+), 2 deletions(-) diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c index 08b8355f52..699ab5e9fa 100644 --- a/target/riscv/cpu.c +++ b/target/riscv/cpu.c @@ -118,6 +118,7 @@ static const struct isa_ext_data isa_edata_arr[] = { ISA_EXT_DATA_ENTRY(zve64d, PRIV_VERSION_1_10_0, ext_zve64d), ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh), ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin), +ISA_EXT_DATA_ENTRY(zvkg, PRIV_VERSION_1_12_0, ext_zvkg), ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned), ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha), ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb), @@ -1194,8 +1195,8 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp) * In principle Zve*x would also suffice here, were they supported * in qemu */ -if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha || - cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) { +if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkg || cpu->cfg.ext_zvkned || + cpu->cfg.ext_zvknha || cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) { error_setg(errp, "Vector crypto extensions require V or Zve* extensions"); return; @@ -1710,6 +1711,7 @@ static Property riscv_cpu_extensions[] = { /* Vector cryptography extensions */ DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false), DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false), +DEFINE_PROP_BOOL("x-zvkg", RISCVCPU, cfg.ext_zvkg, false), DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false), DEFINE_PROP_BOOL("x-zvknha", RISCVCPU, cfg.ext_zvknha, false), DEFINE_PROP_BOOL("x-zvknhb", RISCVCPU, cfg.ext_zvknhb, false), diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h index 27062b12a8..960761c479 100644 --- a/target/riscv/cpu_cfg.h +++ b/target/riscv/cpu_cfg.h @@ -85,6 +85,7 @@ struct RISCVCPUConfig { bool ext_zve64d; bool ext_zvbb; bool ext_zvbc; +bool ext_zvkg; bool ext_zvkned; bool ext_zvknha; bool ext_zvknhb; diff --git a/target/riscv/helper.h b/target/riscv/helper.h index 172c91c65c..238343cb42 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1244,3 +1244,6 @@ DEF_HELPER_5(vsha2cl64_vv, void, ptr, ptr, ptr, env, i32) DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32) DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32) + +DEF_HELPER_5(vghsh_vv, void, ptr, ptr, ptr, env, i32) +DEF_HELPER_4(vgmul_vv, void, ptr, ptr, env, i32) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 5ca83e8462..b10497afd3 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -957,3 +957,7 @@ vsha2cl_vv 10 1 . . 010 . 1110111 @r_vm_1 # *** Zvksh vector crypto extension *** vsm3me_vv 10 1 . . 010 . 1110111 @r_vm_1 vsm3c_vi101011 1 . . 010 . 1110111 @r_vm_1 + +# *** Zvkg vector crypto extension *** +vghsh_vv101100 1 . . 010 . 1110111 @r_vm_1 +vgmul_vv101000 1 . 10001 010 . 1110111 @r2_vm_1 diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc index 6469dd2f02..af7cd62e7d 100644 --- a/target/riscv/insn_trans/trans_rvvk.c.inc +++ b/target/riscv/insn_trans/trans_rvvk.c.inc @@ -531,3 +531,33 @@ static inline bool vsm3c_check(DisasContext *s, arg_rmrr *a) GEN_VV_UNMASKED_TRANS(vsm3me_vv, vsm3me_check, ZVKSH_EGS) GEN_VI_UNMASKED_TRANS(vsm3c_vi, vsm3c_check, ZVKSH_EGS) + +/* + * Zvkg + */ + +#define ZVKG_EGS 4 + +static bool vgmul_check(DisasContext *s, arg_rmr *a) +{ +int egw_bytes = ZVKG_EGS << s->sew; +return s->cfg_ptr->ext_zvkg == true && + vext_check_isa_ill(s) && + require_rvv(s) && + MAXSZ(s) >= egw_bytes && + vext_check_ss(s, a->rd, a->rs2, a->vm) && +
[PATCH v3 1/2] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature
This patch refactors vhost_vdpa_net_load_rx() to restore the packet receive filtering state in relation to VIRTIO_NET_F_CTRL_RX_EXTRA feature at device's startup. Signed-off-by: Hawkins Jiawei --- v3: - return early if mismatch the condition suggested by Eugenio - remove the `on` variable suggested by Eugenio v2: https://lore.kernel.org/all/66ec4d7e3a680de645043d0331ab65940154f2b8.1688365324.git.yin31...@gmail.com/ - avoid sending CVQ command in default state suggested by Eugenio v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg04957.html net/vhost-vdpa.c | 88 1 file changed, 88 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 0994836f8c..9a1905fddd 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -867,6 +867,94 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, } } +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX_EXTRA)) { +return 0; +} + +/* + * According to virtio_net_reset(), device turns all-unicast mode + * off by default. + * + * Therefore, QEMU should only send this CVQ command if the driver + * sets all-unicast mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (n->alluni) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_ALLUNI, 1); +if (dev_written < 0) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + +/* + * According to virtio_net_reset(), device turns non-multicast mode + * off by default. + * + * Therefore, QEMU should only send this CVQ command if the driver + * sets non-multicast mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (n->nomulti) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_NOMULTI, 1); +if (dev_written < 0) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + +/* + * According to virtio_net_reset(), device turns non-unicast mode + * off by default. + * + * Therefore, QEMU should only send this CVQ command if the driver + * sets non-unicast mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (n->nouni) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_NOUNI, 1); +if (dev_written < 0) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + +/* + * According to virtio_net_reset(), device turns non-broadcast mode + * off by default. + * + * Therefore, QEMU should only send this CVQ command if the driver + * sets non-broadcast mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (n->nobcast) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_NOBCAST, 1); +if (dev_written < 0) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + return 0; } -- 2.25.1
[PATCH v3 2/2] vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_RX_EXTRA feature. Signed-off-by: Hawkins Jiawei Acked-by: Eugenio Pérez --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 9a1905fddd..1df82636c9 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -105,6 +105,7 @@ static const uint64_t vdpa_svq_device_features = BIT_ULL(VIRTIO_NET_F_STATUS) | BIT_ULL(VIRTIO_NET_F_CTRL_VQ) | BIT_ULL(VIRTIO_NET_F_CTRL_RX) | +BIT_ULL(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT_ULL(VIRTIO_NET_F_MQ) | BIT_ULL(VIRTIO_F_ANY_LAYOUT) | BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) | -- 2.25.1
Re: [PULL 29/38] gdbstub: Permit reverse step/break to provide stop response
Michael Tokarev writes: > 03.07.2023 16:44, Alex Bennée wrote: >> From: Nicholas Piggin >> The final part of the reverse step and break handling is to bring >> the machine back to a debug stop state. gdb expects a response. >> A gdb 'rsi' command hangs forever because the gdbstub filters out >> the response (also observable with reverse_debugging.py avocado >> tests). >> Fix by setting allow_stop_reply for the gdb backward packets. >> Fixes: 758370052fb ("gdbstub: only send stop-reply packets when >> allowed to") >> Cc: qemu-sta...@nongnu.org > > Hi! > > Are you guys sure this needs to be in -stable? > > To me it looks a sort of "partial revert" of a previous commit: > > commit 758370052fb602f9f23c3b8ae26a6133373c78e6 > Author: Matheus Tavares Bernardino > Date: Thu May 4 12:37:31 2023 -0300 > Subject: gdbstub: only send stop-reply packets when allowed to > > which introduced `allow_stop_reply' field in GdbCmdParseEntry. > This change ("gdbstub: Permit..") does not work in 8.0 without > the above mentioned "gdbstub: only send" commit, and I guess > it is *not* supposed to be in stable. Or is it? > > I'm not applying this one to stable for now. Good catch - your right it's purely fixing something that has been merged in the current cycle. > > Thanks, > > /mjt > >> Cc: Matheus Tavares Bernardino >> Cc: Alex Bennée >> Cc: Taylor Simpson >> Signed-off-by: Nicholas Piggin >> Acked-by: Matheus Tavares Bernardino >> Message-Id: <20230623035304.279833-1-npig...@gmail.com> >> Signed-off-by: Alex Bennée >> Message-Id: <20230630180423.558337-30-alex.ben...@linaro.org> >> diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c >> index be18568d0a..9496d7b175 100644 >> --- a/gdbstub/gdbstub.c >> +++ b/gdbstub/gdbstub.c >> @@ -1814,6 +1814,7 @@ static int gdb_handle_packet(const char *line_buf) >> .handler = handle_backward, >> .cmd = "b", >> .cmd_startswith = 1, >> +.allow_stop_reply = true, >> .schema = "o0" >> }; >> cmd_parser = &backward_cmd_desc; -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: [PATCH v2 1/2] accel/tcg: Split out cpu_exec_longjmp_cleanup
Richard Henderson writes: > Share the setjmp cleanup between cpu_exec_step_atomic > and cpu_exec_setjmp. > > Reviewed-by: Richard W.M. Jones > Signed-off-by: Richard Henderson Reviewed-by: Alex Bennée -- Alex Bennée Virtualisation Tech Lead @ Linaro
[PULL 2/3] linux-user: Fix accept4(SOCK_NONBLOCK) syscall
The Linux accept4() syscall allows two flags only: SOCK_NONBLOCK and SOCK_CLOEXEC, and returns -EINVAL if any other bits have been set. Change the qemu implementation accordingly, which means we can not use the fcntl_flags_tbl[] translation table which allows too many other values. Beside the correction in behaviour, this actually fixes the accept4() emulation for hppa, mips and alpha targets for which SOCK_NONBLOCK is different than TARGET_SOCK_NONBLOCK (aka O_NONBLOCK). The fix can be verified with the testcase of the debian lwt package, which hangs forever in a read() syscall without this patch. Signed-off-by: Helge Deller Reviewed-by: Richard Henderson --- linux-user/syscall.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 10f05b1e55..9b9e3bd5e3 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -3440,7 +3440,17 @@ static abi_long do_accept4(int fd, abi_ulong target_addr, abi_long ret; int host_flags; -host_flags = target_to_host_bitmask(flags, fcntl_flags_tbl); +if (flags & ~(TARGET_SOCK_CLOEXEC | TARGET_SOCK_NONBLOCK)) { +return -TARGET_EINVAL; +} + +host_flags = 0; +if (flags & TARGET_SOCK_NONBLOCK) { +host_flags |= SOCK_NONBLOCK; +} +if (flags & TARGET_SOCK_CLOEXEC) { +host_flags |= SOCK_CLOEXEC; +} if (target_addr == 0) { return get_errno(safe_accept4(fd, NULL, NULL, host_flags)); -- 2.41.0
[PULL 0/3] Linux user fcntl64 patches
The following changes since commit 97c81ef4b8e203d9620fd46e7eb77004563e3675: Merge tag 'pull-9p-20230706' of https://github.com/cschoenebeck/qemu into staging (2023-07-06 18:19:42 +0100) are available in the Git repository at: https://github.com/hdeller/qemu-hppa.git tags/linux-user-fcntl64-pull-request for you to fetch changes up to 036cf169a3484eeca5e17cfbee1f6988043ddd0e: linux-user: Improve strace output of pread64() and pwrite64() (2023-07-08 16:55:08 +0200) linux-user: Fix fcntl64() and accept4() for 32-bit targets A set of 3 patches: The first two patches fix fcntl64() and accept4(). the 3rd patch enhances the strace output for pread64/pwrite64(). This pull request does not includes Richard's mmap2 patch: https://patchew.org/QEMU/20230630132159.376995-1-richard.hender...@linaro.org/20230630132159.376995-12-richard.hender...@linaro.org/ Changes: v3: - added r-b from Richard to patches #1 and #2 v2: - rephrased commmit logs - return O_LARGFILE for fcntl() syscall too - dropped #ifdefs in accept4() patch - Dropped my mmap2() patch (former patch #3) - added r-b from Richard to 3rd patch Helge Helge Deller (3): linux-user: Fix fcntl() and fcntl64() to return O_LARGEFILE for 32-bit targets linux-user: Fix accept4(SOCK_NONBLOCK) syscall linux-user: Improve strace output of pread64() and pwrite64() linux-user/strace.c| 19 +++ linux-user/strace.list | 4 ++-- linux-user/syscall.c | 16 +++- 3 files changed, 36 insertions(+), 3 deletions(-) -- 2.41.0
[PULL 1/3] linux-user: Fix fcntl() and fcntl64() to return O_LARGEFILE for 32-bit targets
When running a 32-bit guest on a 64-bit host, fcntl[64](F_GETFL) should return with the TARGET_O_LARGEFILE flag set, because all 64-bit hosts support large files unconditionally. But on 64-bit hosts, O_LARGEFILE has the value 0, so the flag translation can't be done with the fcntl_flags_tbl[]. Instead add the TARGET_O_LARGEFILE flag afterwards. Note that for 64-bit guests the compiler will optimize away this code, since TARGET_O_LARGEFILE is zero. Signed-off-by: Helge Deller Reviewed-by: Richard Henderson --- linux-user/syscall.c | 4 1 file changed, 4 insertions(+) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 08162cc966..10f05b1e55 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7132,6 +7132,10 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg) ret = get_errno(safe_fcntl(fd, host_cmd, arg)); if (ret >= 0) { ret = host_to_target_bitmask(ret, fcntl_flags_tbl); +/* tell 32-bit guests it uses largefile on 64-bit hosts: */ +if (O_LARGEFILE == 0 && HOST_LONG_BITS == 64) { +ret |= TARGET_O_LARGEFILE; +} } break; -- 2.41.0
[PULL 3/3] linux-user: Improve strace output of pread64() and pwrite64()
Make the strace look nicer for those two syscalls. Signed-off-by: Helge Deller Reviewed-by: Richard Henderson --- linux-user/strace.c| 19 +++ linux-user/strace.list | 4 ++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/linux-user/strace.c b/linux-user/strace.c index aad2b62ca4..669200c4a4 100644 --- a/linux-user/strace.c +++ b/linux-user/strace.c @@ -3999,6 +3999,25 @@ print_tgkill(CPUArchState *cpu_env, const struct syscallname *name, } #endif +#if defined(TARGET_NR_pread64) || defined(TARGET_NR_pwrite64) +static void +print_pread64(CPUArchState *cpu_env, const struct syscallname *name, +abi_long arg0, abi_long arg1, abi_long arg2, +abi_long arg3, abi_long arg4, abi_long arg5) +{ +if (regpairs_aligned(cpu_env, TARGET_NR_pread64)) { +arg3 = arg4; +arg4 = arg5; +} +print_syscall_prologue(name); +print_raw_param("%d", arg0, 0); +print_pointer(arg1, 0); +print_raw_param("%d", arg2, 0); +print_raw_param("%" PRIu64, target_offset64(arg3, arg4), 1); +print_syscall_epilogue(name); +} +#endif + #ifdef TARGET_NR_statx static void print_statx(CPUArchState *cpu_env, const struct syscallname *name, diff --git a/linux-user/strace.list b/linux-user/strace.list index c7808ea118..6655d4f26d 100644 --- a/linux-user/strace.list +++ b/linux-user/strace.list @@ -1068,7 +1068,7 @@ { TARGET_NR_prctl, "prctl" , NULL, NULL, NULL }, #endif #ifdef TARGET_NR_pread64 -{ TARGET_NR_pread64, "pread64" , NULL, NULL, NULL }, +{ TARGET_NR_pread64, "pread64" , NULL, print_pread64, NULL }, #endif #ifdef TARGET_NR_preadv { TARGET_NR_preadv, "preadv" , NULL, NULL, NULL }, @@ -1099,7 +1099,7 @@ { TARGET_NR_putpmsg, "putpmsg" , NULL, NULL, NULL }, #endif #ifdef TARGET_NR_pwrite64 -{ TARGET_NR_pwrite64, "pwrite64" , NULL, NULL, NULL }, +{ TARGET_NR_pwrite64, "pwrite64" , NULL, print_pread64, NULL }, #endif #ifdef TARGET_NR_pwritev { TARGET_NR_pwritev, "pwritev" , NULL, NULL, NULL }, -- 2.41.0
Re: [PULL trival-patches 00/10] trivial-patches for 2023-07-08
On 7/8/23 06:12, Michael Tokarev wrote: The following changes since commit 3b08e40b7abfe8be6020c4c27c93ad85590b9213: Merge tag 'for-upstream' ofhttps://gitlab.com/bonzini/qemu into staging (2023-07-07 20:23:01 +0100) are available in the Git repository at: https://gitlab.com/mjt0k/qemu.git tags/trivial-patches-20230708 for you to fetch changes up to 13a637430be13bda3e6726752936321a1955bc93: hw/arm/virt-acpi-build.c: Add missing header (2023-07-08 07:24:38 +0300) qemu trivial patches for 2023-07-08 Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/8.1 as appropriate. r~
Re: [PATCH v4 26/37] target/arm: Use aesdec_ISB_ISR_AK
On 3/7/23 12:05, Richard Henderson wrote: This implements the AESD instruction. Signed-off-by: Richard Henderson --- target/arm/tcg/crypto_helper.c | 37 +++--- 1 file changed, 16 insertions(+), 21 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v4 25/37] target/arm: Use aesenc_SB_SR_AK
On 3/7/23 12:05, Richard Henderson wrote: This implements the AESE instruction. Signed-off-by: Richard Henderson --- target/arm/tcg/crypto_helper.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v4 27/37] target/arm: Use aesenc_MC
On 3/7/23 12:05, Richard Henderson wrote: This implements the AESMC instruction. Signed-off-by: Richard Henderson --- target/arm/tcg/crypto_helper.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v4 28/37] target/arm: Use aesdec_IMC
On 3/7/23 12:05, Richard Henderson wrote: This implements the AESIMC instruction. We have converted everything to crypto/aes-round.h; crypto/aes.h is no longer needed. Signed-off-by: Richard Henderson --- target/arm/tcg/crypto_helper.c | 33 ++--- 1 file changed, 14 insertions(+), 19 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v4 31/37] target/riscv: Use aesdec_IMC
On 3/7/23 12:05, Richard Henderson wrote: This implements the AES64IM instruction. Signed-off-by: Richard Henderson --- target/riscv/crypto_helper.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH] linux-user: make sure brk(0) returns a page-aligned value
On 7/6/23 12:34, Andreas Schwab wrote: Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Signed-off-by: Andreas Schwab --- linux-user/syscall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 08162cc966..e8a17377f5 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -805,7 +805,7 @@ static abi_ulong brk_page; void target_set_brk(abi_ulong new_brk) { -target_brk = new_brk; +target_brk = TARGET_PAGE_ALIGN(new_brk); brk_page = HOST_PAGE_ALIGN(target_brk); } It make sense, since that's how do_brk aligns things. I'm curious why this error might have produced host memory clobbering, but I'm not going to debug that. Queuing for tcg/linux-user. r~
Re: [PATCH v4 33/37] target/riscv: Use aesdec_ISB_ISR_IMC_AK
On 3/7/23 12:05, Richard Henderson wrote: This implements the AES64DSM instruction. This was the last use of aes64_operation and its support macros, so remove them all. Signed-off-by: Richard Henderson --- target/riscv/crypto_helper.c | 101 --- 1 file changed, 10 insertions(+), 91 deletions(-) target_ulong HELPER(aes64esm)(target_ulong rs1, target_ulong rs2) { AESState t; @@ -228,7 +138,16 @@ target_ulong HELPER(aes64ds)(target_ulong rs1, target_ulong rs2) target_ulong HELPER(aes64dsm)(target_ulong rs1, target_ulong rs2) { -return aes64_operation(rs1, rs2, false, true); +AESState t, z = { }; z can be const, otherwise: Reviewed-by: Philippe Mathieu-Daudé + +/* + * This instruction does not include a round key, + * so supply a zero to our primitive. + */ +t.d[HOST_BIG_ENDIAN] = rs1; +t.d[!HOST_BIG_ENDIAN] = rs2; +aesdec_ISB_ISR_IMC_AK(&t, &t, &z, false); +return t.d[HOST_BIG_ENDIAN]; } target_ulong HELPER(aes64ks2)(target_ulong rs1, target_ulong rs2)
Re: [PATCH v4 13/37] host/include/aarch64: Implement aes-round.h
+Ard On 3/7/23 12:04, Richard Henderson wrote: Detect AES in cpuinfo; implement the accel hooks. Signed-off-by: Richard Henderson --- meson.build | 9 + host/include/aarch64/host/cpuinfo.h | 1 + host/include/aarch64/host/crypto/aes-round.h | 205 +++ util/cpuinfo-aarch64.c | 2 + 4 files changed, 217 insertions(+) create mode 100644 host/include/aarch64/host/crypto/aes-round.h diff --git a/meson.build b/meson.build index a9ba0bfab3..029c6c0048 100644 --- a/meson.build +++ b/meson.build @@ -2674,6 +2674,15 @@ config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \ int main(int argc, char *argv[]) { return bar(argv[0]); } '''), error_message: 'AVX512BW not available').allowed()) +# For both AArch64 and AArch32, detect if builtins are available. +config_host_data.set('CONFIG_ARM_AES_BUILTIN', cc.compiles(''' +#include +#ifndef __ARM_FEATURE_AES +__attribute__((target("+crypto"))) +#endif +void foo(uint8x16_t *p) { *p = vaesmcq_u8(*p); } + ''')) + have_pvrdma = get_option('pvrdma') \ .require(rdma.found(), error_message: 'PVRDMA requires OpenFabrics libraries') \ .require(cc.compiles(gnu_source_prefix + ''' diff --git a/host/include/aarch64/host/cpuinfo.h b/host/include/aarch64/host/cpuinfo.h index 82227890b4..05feeb4f43 100644 --- a/host/include/aarch64/host/cpuinfo.h +++ b/host/include/aarch64/host/cpuinfo.h @@ -9,6 +9,7 @@ #define CPUINFO_ALWAYS (1u << 0) /* so cpuinfo is nonzero */ #define CPUINFO_LSE (1u << 1) #define CPUINFO_LSE2(1u << 2) +#define CPUINFO_AES (1u << 3) /* Initialized with a constructor. */ extern unsigned cpuinfo; diff --git a/host/include/aarch64/host/crypto/aes-round.h b/host/include/aarch64/host/crypto/aes-round.h new file mode 100644 index 00..8b5f88d50c --- /dev/null +++ b/host/include/aarch64/host/crypto/aes-round.h @@ -0,0 +1,205 @@ +/* + * AArch64 specific aes acceleration. + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#ifndef AARCH64_HOST_CRYPTO_AES_ROUND_H +#define AARCH64_HOST_CRYPTO_AES_ROUND_H + +#include "host/cpuinfo.h" +#include + +#ifdef __ARM_FEATURE_AES +# define HAVE_AES_ACCEL true +#else +# define HAVE_AES_ACCEL likely(cpuinfo & CPUINFO_AES) +#endif +#if !defined(__ARM_FEATURE_AES) && defined(CONFIG_ARM_AES_BUILTIN) +# define ATTR_AES_ACCEL __attribute__((target("+crypto"))) +#else +# define ATTR_AES_ACCEL +#endif + +static inline uint8x16_t aes_accel_bswap(uint8x16_t x) +{ +return vqtbl1q_u8(x, (uint8x16_t){ 15, 14, 13, 12, 11, 10, 9, 8, +7, 6, 5, 4, 3, 2, 1, 0, }); +} + +#ifdef CONFIG_ARM_AES_BUILTIN +# define aes_accel_aesdvaesdq_u8 +# define aes_accel_aesevaeseq_u8 +# define aes_accel_aesmc vaesmcq_u8 +# define aes_accel_aesimc vaesimcq_u8 +# define aes_accel_aesd_imc(S, K) vaesimcq_u8(vaesdq_u8(S, K)) +# define aes_accel_aese_mc(S, K) vaesmcq_u8(vaeseq_u8(S, K)) +#else +static inline uint8x16_t aes_accel_aesd(uint8x16_t d, uint8x16_t k) +{ +asm(".arch_extension aes\n\t" +"aesd %0.16b, %1.16b" : "+w"(d) : "w"(k)); +return d; +} + +static inline uint8x16_t aes_accel_aese(uint8x16_t d, uint8x16_t k) +{ +asm(".arch_extension aes\n\t" +"aese %0.16b, %1.16b" : "+w"(d) : "w"(k)); +return d; +} + +static inline uint8x16_t aes_accel_aesmc(uint8x16_t d) +{ +asm(".arch_extension aes\n\t" +"aesmc %0.16b, %1.16b" : "=w"(d) : "w"(d)); +return d; +} + +static inline uint8x16_t aes_accel_aesimc(uint8x16_t d) +{ +asm(".arch_extension aes\n\t" +"aesimc %0.16b, %1.16b" : "=w"(d) : "w"(d)); +return d; +} + +/* Most CPUs fuse AESD+AESIMC in the execution pipeline. */ +static inline uint8x16_t aes_accel_aesd_imc(uint8x16_t d, uint8x16_t k) +{ +asm(".arch_extension aes\n\t" +"aesd %0.16b, %1.16b\n\t" +"aesimc %0.16b, %0.16b" : "+w"(d) : "w"(k)); +return d; +} + +/* Most CPUs fuse AESE+AESMC in the execution pipeline. */ +static inline uint8x16_t aes_accel_aese_mc(uint8x16_t d, uint8x16_t k) +{ +asm(".arch_extension aes\n\t" +"aese %0.16b, %1.16b\n\t" +"aesmc %0.16b, %0.16b" : "+w"(d) : "w"(k)); +return d; +} +#endif /* CONFIG_ARM_AES_BUILTIN */ + +static inline void ATTR_AES_ACCEL +aesenc_MC_accel(AESState *ret, const AESState *st, bool be) +{ +uint8x16_t t = (uint8x16_t)st->v; + +if (be) { +t = aes_accel_bswap(t); +t = aes_accel_aesmc(t); +t = aes_accel_bswap(t); +} else { +t = aes_accel_aesmc(t); +} +ret->v = (AESStateVec)t; +} + +static inline void ATTR_AES_ACCEL +aesenc_SB_SR_AK_accel(AESState *ret, const AESState *st, + const AESState *rk, bool be) +{ +uint8x16_t t = (uint8x16_t)st->v; +uint8x16_t z = { }; + +if (be) { +t = aes_accel_bswap(t); +
Re: [PATCH v4 00/37] crypto: Provide aes-round.h and host accel
On 3/7/23 12:04, Richard Henderson wrote: Inspired by Ard Biesheuvel's RFC patches for accelerating AES under emulation, provide a set of primitives that maps between the guest and host fragments. Changes for v4: * Fix typo in AESState (Max Chou) * Define AES_SH/ISH as macros (Ard Biesheuvel) * Group patches by subsystem. Patches lacking review: 12-host-include-i386-Implement-aes-round.h.patch Deferring this one to Paolo & co, 13-host-include-aarch64-Implement-aes-round.h.patch and this one to Ard :) Possible cleanup to add in patch #4 "crypto/aes: Add AES_SH, AES_ISH macros", declare 'extern const AESState aes_zero;' in include/crypto/aes-round.h and define it in crypto/aes.c. Regards, Phil.
Re: [PATCH v2 21/24] accel/tcg: Accept more page flags in page_check_range
On 7/7/23 22:40, Richard Henderson wrote: Only PAGE_WRITE needs special attention, all others can be handled as we do for PAGE_READ. Adjust the mask. Signed-off-by: Richard Henderson --- accel/tcg/user-exec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v2 17/24] linux-user: Use 'last' instead of 'end' in target_mmap
On 7/7/23 22:40, Richard Henderson wrote: Complete the transition within the mmap functions to a formulation that does not overflow at the end of the address space. Signed-off-by: Richard Henderson --- linux-user/mmap.c | 45 +++-- 1 file changed, 23 insertions(+), 22 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH] linux-user: make sure brk(0) returns a page-aligned value
On 7/8/23 19:26, Richard Henderson wrote: On 7/6/23 12:34, Andreas Schwab wrote: Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Signed-off-by: Andreas Schwab --- linux-user/syscall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 08162cc966..e8a17377f5 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -805,7 +805,7 @@ static abi_ulong brk_page; void target_set_brk(abi_ulong new_brk) { - target_brk = new_brk; + target_brk = TARGET_PAGE_ALIGN(new_brk); brk_page = HOST_PAGE_ALIGN(target_brk); } It make sense, since that's how do_brk aligns things. Yes, patch looks good. I haven't tested, but it seems it adjusts the initial brk(0) value only to make sure that it's target page aligned. Maybe the title should be: ? linux-user: make sure the initial brk(0) is page-aligned I'm curious why this error might have produced host memory clobbering, but I'm not going to debug that. I don't believe that this un-alignment triggers host memory clobbering either. Helge
Re: [PATCH] linux-user: make sure brk(0) returns a page-aligned value
On 7/8/23 23:36, Helge Deller wrote: On 7/8/23 19:26, Richard Henderson wrote: On 7/6/23 12:34, Andreas Schwab wrote: Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Signed-off-by: Andreas Schwab --- linux-user/syscall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 08162cc966..e8a17377f5 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -805,7 +805,7 @@ static abi_ulong brk_page; void target_set_brk(abi_ulong new_brk) { - target_brk = new_brk; + target_brk = TARGET_PAGE_ALIGN(new_brk); brk_page = HOST_PAGE_ALIGN(target_brk); } ... I'm curious why this error might have produced host memory clobbering, but I'm not going to debug that. I don't believe that this un-alignment triggers host memory clobbering either. See my follow-up in the other mail threadd: "Re: [RISC-V] ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: (cpu == current_cpu)" Helge
Re: [RISC-V] ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: (cpu == current_cpu)
On 7/4/23 12:52, Andreas Schwab wrote: I think the issue is that the value returned from brk(0) is no longer page aligned. $ ./qemu-riscv64 -strace ../exe1 18329 brk(NULL) = 0x00303000 18329 faccessat(AT_FDCWD,"/etc/ld.so.preload",R_OK,0x3010d0) = -1 errno=2 (No such file or directory) 18329 openat(AT_FDCWD,"/etc/ld.so.cache",O_RDONLY|O_CLOEXEC) = 3 18329 newfstatat(3,"",0x0040007fe900,0x1000) = 0 18329 mmap(NULL,8799,PROT_READ,MAP_PRIVATE,3,0) = 0x004000824000 18329 close(3) = 0 18329 openat(AT_FDCWD,"/lib64/lp64d/libc.so.6",O_RDONLY|O_CLOEXEC) = 3 18329 read(3,0x7fea70,832) = 832 18329 newfstatat(3,"",0x0040007fe8f0,0x1000) = 0 18329 mmap(NULL,1405128,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,3,0) = 0x004000827000 18329 mmap(0x00400096d000,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x146000) = 0x00400096d000 18329 mmap(0x004000972000,49352,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,-1,0) = 0x004000972000 18329 close(3) = 0 18329 mmap(NULL,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x00400097f000 18329 set_tid_address(0x400097f710) = 18329 18329 set_robust_list(0x400097f720,24) = -1 errno=38 (Function not implemented) 18329 mprotect(0x00400096d000,12288,PROT_READ) = 0 18329 mprotect(0x00400082,4096,PROT_READ) = 0 18329 prlimit64(0,RLIMIT_STACK,NULL,0x0040007ff4f8) = 0 ({rlim_cur=8388608,rlim_max=-1}) 18329 munmap(0x004000824000,8799) = 0 18329 newfstatat(1,"",0x0040007ff658,0x1000) = 0 18329 getrandom(0x4000976a40,8,1) = 8 18329 brk(NULL) = 0x00303000 18329 brk(0x00324000) = 0x00324000 18329 write(1,0x3032a0,12)Hello world = 12 18329 exit_group(0) $ qemu-riscv64 -strace ../exe1 18369 brk(NULL) = 0x003022e8 18369 faccessat(AT_FDCWD,"/etc/ld.so.preload",R_OK,0x3010d0) = -1 errno=2 (No such file or directory) 18369 openat(AT_FDCWD,"/etc/ld.so.cache",O_RDONLY|O_CLOEXEC) = 3 18369 newfstatat(3,"",0x0040007fe8f0,0x1000) = 0 18369 mmap(NULL,8799,PROT_READ,MAP_PRIVATE,3,0) = 0x004000824000 18369 close(3) = 0 18369 openat(AT_FDCWD,"/lib64/lp64d/libc.so.6",O_RDONLY|O_CLOEXEC) = 3 18369 read(3,0x7fea60,832) = 832 18369 newfstatat(3,"",0x0040007fe8e0,0x1000) = 0 18369 mmap(NULL,1405128,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,3,0) = 0x004000827000 18369 mmap(0x00400096d000,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x146000) = 0x00400096d000 18369 mmap(0x004000972000,49352,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,-1,0) = 0x004000972000 18369 close(3) = 0 18369 mmap(NULL,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x00400097f000 18369 set_tid_address(0x400097f710) = 18369 18369 set_robust_list(0x400097f720,24) = -1 errno=38 (Function not implemented) 18369 mprotect(0x00400096d000,12288,PROT_READ) = 0 18369 mprotect(0x00400082,4096,PROT_READ) = 0 18369 prlimit64(0,RLIMIT_STACK,NULL,0x0040007ff4e8) = 0 ({rlim_cur=8388608,rlim_max=-1}) 18369 munmap(0x004000824000,8799) = 0 18369 newfstatat(1,"",0x0040007ff648,0x1000) = 0 18369 getrandom(0x4000976a40,8,1) = 8 18369 brk(NULL) = 0x003022e8 18369 brk(0x003232e8)** ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: (cpu == current_cpu) Bail out! ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: (cpu == current_cpu) ** ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: (cpu == current_cpu) Bail out! ERROR:../accel/tcg/cpu-exec.c:1028:cpu_exec_setjmp: assertion failed: (cpu == current_cpu) This reminds me on a failure I once saw on the hppa target. See commit bd4b7fd6ba98 ("linux-user/hppa: Fix segfaults on page zero"). Maybe the not-page-aligned brk address triggers the glibc or application in the guest to jump somewhere else (see cpu_exec_setjmp)? The example in my commit message jumped to address 0, which isn't writeable for applications in the target machine and qemu was missing to trigger/handle the correct target exception handling. I think your patch to page-align the initial brk() is correct, but it probably just hides the real problem. Maybe you are able to test what happens with exe1 on a physical risc-v machine if the brk-adress wouldn't be page aligned? Maybe you are missing some exception handling for risc-v in qemu too? Helge