This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES extensions, which are floating-point related. (Summary of what these are exactly is at the bottom of the cover letter.)
If you'd rather have these patches as a git branch: https://git.linaro.org/people/pmaydell/qemu-arm.git feat-afp with human readable web view at: https://git.linaro.org/people/peter.maydell/qemu-arm.git/log/?h=feat-afp Changes between v1 and v2: * first part of the series has been upstreamed * I've left the first two x86 patches in here, just to avoid having to use a Based-on: tag. They've both been taken by Paolo already, they just haven't landed upstream yet. * the tail-end patches fixing x86 denormal support are not posted here (indeed I didn't mean to send them in v1!); I'll send those separately once the underlying softfloat patches are upstream * the renaming of the FPST_ constants (already upstream) is carried through into these patches * name changes in the "allow flushing of output denormals to be after rounding" patch: now set_float_ftz_detection(), get_float_ftz_detection(), float_ftz_after_rounding and float_ftz_before_rounding * moved select_fpst to translate-a64.h and renamed to select_ah_fpst * use vec_full_reg_offset() in the write_fp_*reg_merging fns * drop no-longer-nedeed float*_input_flush2() calls in the float*_hs_compare() fns in "implement float_flag_input_denormal_used" * adopted RTH's patchset, by a mix of merging in fixes to my patches and adding his (partly on the end, and partly sorted into the series at appropriate places). I updated commit messages in a few places (notably standardising them onto "Handle X for <some insn>" rather than "for <some QEMU function>") Patches that still need review: 04 fpu: Implement float_flag_input_denormal_used 05 fpu: allow flushing of output denormals to be after rounding 06 target/arm: Define FPCR AH, FIZ, NEP bits RTH: I kept your r-by tags on the patches where I squashed in your fixes from your followup series (mostly this is the changes to use the muladd flags). If you want to re-review to check that I did the squashing right, those are patches: 37 target/arm: Handle FPCR.AH in negation steps in SVE FCADD 38 target/arm: Handle FPCR.AH in negation steps in FCADD 41 target/arm: Handle FPCR.AH in negation step in FMLS (indexed) 42 target/arm: Handle FPCR.AH in negation in FMLS (vector) 43 target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector) 44 target/arm: Handle FPCR.AH in SVE FTSSEL 45 target/arm: Handle FPCR.AH in SVE FTMAD Summary of what FEAT_AFP/FEAT_RPRES are, from v1 cover letter: FEAT_AFP defines three new control bits in the FPCR, whose operations are basically independent of each other: * FPCR.AH: "alternate floating point mode"; this changes floating point behaviour in a variety of ways, including: - the sign of a default NaN is 1, not 0 - if FPCR.FZ is also 1, denormals detected after rounding with an unbounded exponent has been applied are flushed to zero - FPCR.FZ does not cause denormalized inputs to be flushed to zero - miscellaneous other corner-case behaviour changes * FPCR.FIZ: flush denormalized numbers to zero on input for most instructions * FPCR.NEP: makes scalar SIMD operations merge the result with higher vector elements in one of the source registers, instead of zeroing the higher elements of the destination FEAT_RPRES makes single-precision FRECPE and FRSQRTE use a 12-bit mantissa precision instead of 8-bit when FPCR.AH is set. thanks -- PMM Peter Maydell (50): target/i386: Do not raise Invalid for 0 * Inf + QNaN tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases fpu: Add float_class_denormal fpu: Implement float_flag_input_denormal_used fpu: allow flushing of output denormals to be after rounding target/arm: Define FPCR AH, FIZ, NEP bits target/arm: Implement FPCR.FIZ handling target/arm: Adjust FP behaviour for FPCR.AH = 1 target/arm: Adjust exception flag handling for AH = 1 target/arm: Add FPCR.AH to tbflags target/arm: Set up float_status to use for FPCR.AH=1 behaviour target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS target/arm: Use FPST_FPCR_AH for BFCVT* insns target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns target/arm: Add FPCR.NEP to TBFLAGS target/arm: Define and use new write_fp_*reg_merging() functions target/arm: Handle FPCR.NEP for 3-input scalar operations target/arm: Handle FPCR.NEP for BFCVT scalar target/arm: Handle FPCR.NEP for 1-input scalar operations target/arm: Handle FPCR.NEP in do_cvtf_scalar() target/arm: Handle FPCR.NEP for scalar FABS and FNEG target/arm: Handle FPCR.NEP for FCVTXN (scalar) target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX target/arm: Implement FPCR.AH semantics for FMAXV and FMINV target/arm: Implement FPCR.AH semantics for FMINP and FMAXP target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector target/arm: Implement FPCR.AH handling of negation of NaN target/arm: Implement FPCR.AH handling for scalar FABS and FABD target/arm: Handle FPCR.AH in vector FABD target/arm: Handle FPCR.AH in SVE FNEG target/arm: Handle FPCR.AH in SVE FABS target/arm: Handle FPCR.AH in SVE FABD target/arm: Handle FPCR.AH in negation steps in SVE FCADD target/arm: Handle FPCR.AH in negation steps in FCADD target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns target/arm: Handle FPCR.AH in negation step in FMLS (indexed) target/arm: Handle FPCR.AH in negation in FMLS (vector) target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector) target/arm: Handle FPCR.AH in SVE FTSSEL target/arm: Handle FPCR.AH in SVE FTMAD target/arm: Enable FEAT_AFP for '-cpu max' target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper target/arm: Implement increased precision FRECPE target/arm: Implement increased precision FRSQRTE target/arm: Enable FEAT_RPRES for -cpu max Richard Henderson (19): target/arm: Handle FPCR.AH in vector FCMLA target/arm: Handle FPCR.AH in FCMLA by index target/arm: Handle FPCR.AH in SVE FCMLA target/arm: Handle FPCR.AH in FMLSL (by element and vector) target/arm: Handle FPCR.AH in SVE FMLSL (indexed) target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors) target/arm: Introduce CPUARMState.vfp.fp_status[] target/arm: Remove standard_fp_status_f16 target/arm: Remove standard_fp_status target/arm: Remove ah_fp_status_f16 target/arm: Remove ah_fp_status target/arm: Remove fp_status_f16_a64 target/arm: Remove fp_status_f16_a32 target/arm: Remove fp_status_a64 target/arm: Remove fp_status_a32 target/arm: Simplify fp_status indexing in mve_helper.c target/arm: Simplify DO_VFP_cmp in vfp_helper.c target/arm: Read fz16 from env->vfp.fpcr target/arm: Sink fp_status and fpcr access into do_fmlal* docs/system/arm/emulation.rst | 2 + include/fpu/softfloat-helpers.h | 11 + include/fpu/softfloat-types.h | 41 +- target/arm/cpu-features.h | 10 + target/arm/cpu.h | 97 ++-- target/arm/helper.h | 26 + target/arm/internals.h | 6 + target/arm/tcg/helper-a64.h | 13 + target/arm/tcg/helper-sve.h | 120 +++++ target/arm/tcg/translate-a64.h | 13 + target/arm/tcg/translate.h | 54 +-- target/arm/tcg/vec_internal.h | 35 ++ target/mips/fpu_helper.h | 6 + fpu/softfloat.c | 66 ++- target/alpha/cpu.c | 7 + target/arm/cpu.c | 46 +- target/arm/helper.c | 2 +- target/arm/tcg/cpu64.c | 2 + target/arm/tcg/helper-a64.c | 151 +++--- target/arm/tcg/hflags.c | 13 + target/arm/tcg/mve_helper.c | 44 +- target/arm/tcg/sme_helper.c | 4 +- target/arm/tcg/sve_helper.c | 367 +++++++++++---- target/arm/tcg/translate-a64.c | 782 +++++++++++++++++++++++++------ target/arm/tcg/translate-sve.c | 193 ++++++-- target/arm/tcg/vec_helper.c | 387 ++++++++++----- target/arm/vfp_helper.c | 372 ++++++++++++--- target/hppa/fpu_helper.c | 11 + target/i386/tcg/fpu_helper.c | 13 +- target/mips/msa.c | 9 + target/ppc/cpu_init.c | 3 + target/rx/cpu.c | 8 + target/sh4/cpu.c | 8 + target/tricore/helper.c | 1 + tests/fp/fp-bench.c | 1 + tests/tcg/x86_64/fma.c | 109 +++++ fpu/softfloat-parts.c.inc | 132 +++++- tests/tcg/x86_64/Makefile.target | 1 + 38 files changed, 2452 insertions(+), 714 deletions(-) create mode 100644 tests/tcg/x86_64/fma.c -- 2.34.1