The v3 ISA for x86_64 includes F16C and FMA instructions in addition to AVX2 that was just committed for QEMU 7.2. This small series implements these two features and terminates my excursion into x86 TCG. :)
Patch 1 is a bugfix for the new decoder (sob). Patch 2 introduces a common function to convert an x86 2-bit rounding mode (as specified in FSTCW, STMXCSR and VROUND instructions). Since the same operand is used by the F16C instruction VCVTPS2PH, it is time to reduce the code duplication instead of adding more. Patches 3 and 4 is the actual implementation, which includes all of helpers, decoding, TCG op emission and tests. Output from QEMU matches that from native x86. Paolo Bonzini (4): target/i386: decode-new: avoid out-of-bounds access to xmm_regs[-1] target/i386: introduce function to set rounding mode from FPCW or MXCSR bits target/i386: implement F16C instructions target/i386: implement FMA instructions target/i386/cpu.c | 8 +- target/i386/cpu.h | 3 + target/i386/ops_sse.h | 152 +++++++++++++++++++------------ target/i386/ops_sse_header.h | 34 +++++++ target/i386/tcg/decode-new.c.inc | 46 ++++++++++ target/i386/tcg/decode-new.h | 3 + target/i386/tcg/emit.c.inc | 60 +++++++++++- target/i386/tcg/fpu_helper.c | 60 +++++------- tests/tcg/i386/test-avx.c | 17 ++++ tests/tcg/i386/test-avx.py | 8 +- 10 files changed, 289 insertions(+), 102 deletions(-) -- 2.37.3