This is now mostly ready and has been tested quite heavily, but I expect to repost a final version once the PC-relative code generation patches are in. I also plan to do more testing in the meanwhile, which might well find other bugs of course. I also have not looked at all into XSAVE/XRSTOR support for usermode emulation sigcontext; this is already a missing feature but it becomes more important for AVX.
Compared to the previous RFC there are a bunch of bugfixes, mostly for big-endian systems but also for system emulation (XSAVE/XRSTOR, without which OSes cannot enable AVX even though usermode emulation can cheat). They are detailed below. Code generation changes cover mostly what was pointed out in the review, but also reusing the new functionality introduced to fix bugs. 3DNow has been converted to the new decoder. The series is at the i386 branch of https://gitlab.com/bonzini/qemu, up to commit 94743924ea14103e348eb4ca533945213fa4018a. The final patch, removing the old SSE decoder, seems to be too big for the mailing list, so I removed the big hunk in the middle that just deletes gen_sse and the tables above it. Paolo Bugfixes from v1: * enter MMX for PSHUFW * categorized MOVNTSS, MOVNTSD as SSE4A * categorized CVTPI2PS, CVTPI2PD, CVTPS2PI, CVTPD2PI, CVTTPS2PI, CVTTPD2PI as non-VEX * fixed length of argument of CVTPS2PI and CVTTPS2PI * fixed X86_SPECIAL_AVXExtMov which reversed MO_128/MO_256 * tested SSE4a and AES * finished implementation of 256-bit AES instructions * removed some unnecessary/wrong X86_SPECIAL_MMX annotations * fix signedness of 0F3Ah immediates * fixed big-endian support in patch 2 (old decoder) * fixed big-endian support in MOVLPx, MOVHPx, MOVLHPS, MOVSD, MOVSS, PMOVMSKB, VEXTRACTx128, VGATHER (new decoder) * tested system emulation, which actually covers XSAVE/XRSTOR Other code generation changes from v1: * more operations (addus, adds, subus, subs, minu, mins, mullw, mulld, broadcast, abs) moved to gvec * pointer temps for helpers are generated lazily * implement alignment restrictions for SSE instructions * PMOVMSKB now uses extract2 or deposit * looked into using maxsz > oprsz feature, but it does not work on big-endian hosts * change tcg_const to tcg_constant * fixed register changes before loads; unaligned loads always go through a temporary for the same reason * reimplemented VZEROALL using gen_helper_memset * reimplemented VZEROUPPER using gvec moves * introduced new function vector_elem_offset, mostly for big-endian but it has a few other uses Testing changes from v1: * added more AES and VAES testcases Decoding changes from v1: * removed #define of gen_V* to gen_P* * split group 12/13/14 decoding * converted 3DNow to new decoder * used decode_by_prefix where applicable * interpret prefixes at decode time for 0F5B, 0F77, 0F78, 0F79, 0F7E, 0FE6 * cleaned up 0F6F, splitting 0F7F out of it Other cleanups from v1: * added remark on VEX.256 being available for MOVLPx * changed disas_insn_new to return void * moved switch labels out of if statements * changed abort() to g_assert_not_reached() * left out "default: abort()" altogether when applicable * fixed spacing around vgather helpers * removed some (most) inline markers, compiled with clang * added const markers to all X86OpEntry arrays * squashed move of scalar VEX operations into a single patch * fixed checkpatch complaints (outside the table) * improved some commit messages Paolo Bonzini (32): target/i386: make ldo/sto operations consistent with ldq target/i386: REPZ and REPNZ are mutually exclusive target/i386: introduce insn_get_addr target/i386: add core of new i386 decoder target/i386: add ALU load/writeback core target/i386: add CPUID[EAX=7,ECX=0].ECX to DisasContext target/i386: add CPUID feature checks to new decoder target/i386: validate VEX prefixes via the instructions' exception classes target/i386: validate SSE prefixes directly in the decoding table target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder target/i386: extend helpers to support VEX.V 3- and 4- operand encodings target/i386: support operand merging in binary scalar helpers target/i386: provide 3-operand versions of unary scalar helpers target/i386: implement additional AVX comparison operators target/i386: Introduce 256-bit vector helpers target/i386: reimplement 0x0f 0x60-0x6f, add AVX target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX target/i386: reimplement 0x0f 0x50-0x5f, add AVX target/i386: reimplement 0x0f 0x78-0x7f, add AVX target/i386: reimplement 0x0f 0x70-0x77, add AVX target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX target/i386: clarify (un)signedness of immediates from 0F3Ah opcodes target/i386: reimplement 0x0f 0x3a, add AVX target/i386: reimplement 0x0f 0x38, add AVX target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX target/i386: reimplement 0x0f 0x10-0x17, add AVX target/i386: reimplement 0x0f 0x28-0x2f, add AVX target/i386: implement XSAVE and XRSTOR of AVX registers target/i386: implement VLDMXCSR/VSTMXCSR tests/tcg: extend SSE tests to AVX target/i386: move 3DNow to the new decoder target/i386: remove old SSE decoder Paul Brook (3): target/i386: add AVX_EN hflag target/i386: Prepare ops_sse_header.h for 256 bit AVX target/i386: Enable AVX cpuid bits when using TCG Richard Henderson (2): target/i386: Define XMMReg and access macros, align ZMM registers target/i386: Use tcg gvec ops for pmovmskb target/i386/cpu.c | 10 +- target/i386/cpu.h | 59 +- target/i386/helper.c | 12 + target/i386/helper.h | 2 + target/i386/ops_sse.h | 700 ++++++---- target/i386/ops_sse_header.h | 347 +++-- target/i386/tcg/decode-new.c.inc | 1791 ++++++++++++++++++++++++ target/i386/tcg/decode-new.h | 249 ++++ target/i386/tcg/emit.c.inc | 2234 ++++++++++++++++++++++++++++++ target/i386/tcg/fpu_helper.c | 82 +- target/i386/tcg/translate.c | 2117 ++-------------------------- tests/tcg/i386/Makefile.target | 2 +- tests/tcg/i386/test-avx.c | 201 +-- tests/tcg/i386/test-avx.py | 5 +- 14 files changed, 5298 insertions(+), 2513 deletions(-) create mode 100644 target/i386/tcg/decode-new.c.inc create mode 100644 target/i386/tcg/decode-new.h create mode 100644 target/i386/tcg/emit.c.inc -- 2.37.2