Il ven 10 gen 2025, 10:52 Michael Clark <mich...@anarch128.org> ha scritto:

> a note to announce a port of the x86-mini disassembler to QEMU.
>
> - https://github.com/michaeljclark/qemu/tree/x86-mini


I assume the huge .h files are autogenerated? If so, QEMU cannot use them
without including the human-readable sources in the tree.

I can see how that might be interesting for x86 virtualization where you
have only one target and therefore you can get rid of the capstone
dependency. At the same time, other virtualization targets like arm64 and
RISC-V are going to become more and more important—not less—and not having
to maintain a disassembler ourselves as part of QEMU is also a big plus...

Paolo


> - https://github.com/michaeljclark/x86/tree/x86-mini
>
> # x86-mini
>
> the x86-mini library is a lightweight x86 encoder, decoder, and
> disassembler that uses extensions to the Intel instruction set
> metadata format to encode modern VEX/EVEX instructions and legacy
> instructions using a parameterized LEX (legacy extension) format.
>
> - metadata-driven disassembler with Intel format output.
> - written in C11 for compatibility with projects written in C.
> - low-level instruction encoder and decoder uses <= 32-bytes.
> - python tablegen program to generate C tables from CSV metadata.
> - metadata table tool to inspect operand encode and decode tables.
> - carefully checked machine-readable instruction set metadata.
> - support for REX/VEX/EVEX and preliminary support for REX2.
>
> the x86-mini x86 encoder and decoder library has been written from
> scratch to be modern and as simple as possible while also covering
> recent additions to the Intel and AMD 64-bit instruction sets such
> as the EVEX encodings for recent AVX-512 extensions and soon REX2/
> EVEX encodings for Intel APX, as it is written with that in mind.
>
> ## interest to the QEMU community
>
> - x86-mini is fast. raw decode performance is ~100-200MiB/sec.
> - x86-mini is small. 5 files, ~5 KLOC or ~13 KLOC including tables.
> - x86-mini is complete and includes the latest AVX-512 extensions.
> - x86-mini is easy to extend and uses extended Intel format metadata.
> - x86-mini is documented with detailed info on the metadata format.
> - x86-mini has CLI tools for searching x86 instruction set metadata.
>
> ## techinical notes
>
> - the decoder is table-based and uses a metadata interpreter.
> - the decode table is ~66KiB with a ~150KiB acceleration trie.
> - there are currently 3658 opcode entries active on x86-64
>   which expands to 4775 table entries due to parameterization.
> - it could be made faster by vectorizing the prefix decoder and
>   generating decode templates from the metadata to consteval
>   metadata interpretation to eliminate some L1 D$ traffic.
>
> after cherry-picking the commit, one can test host and target
> disassembly support. e.g. for an x86-64 target on an x86-64 host:
>
> $ echo aaa | qemu-x86_64 -d in_asm,out_asm /usr/bin/openssl sha256
>
> ## caveats and limitations
>
> - supports 32-bit and 64-bit disassembly, and theoretically 16-bit.
> - designed to support 16-bit but base index formats are not done yet.
> - x86-64 is exhaustively fuzz-tested against the LLVM disassembler.
> - but x86-mini is new and hasn't been battle-tested in production.
>
> if you already link with capstone then it doesn't provide very many
> immediate benefits, however, I think it is potentially useful as a
> small embeddable disassembler to evaluate for potential inclusion.
>
> ## rationale
>
> I worked on the QEMU disassembler while working on the QEMU RISC-V
> target back in 2017/2018 and I was curious about vector support.
> it seemed at the time that TCG vector support was piecemeal, plus
> the old x86 disassembler seemed messy and incomplete. I also needed
> an MIT-licensed disassembler to enable use in a commercial product.
> basically, I was looking for a lightweight symmetric x86 instruction
> encoder and decoder library in pure C with simple build requirements.
> that is what prompted this initiative.
>
> it would be nice to have an x86 disassembler building out-of-the-box
> as I find QEMU's built-in tracing extremely useful and given x86 is
> a popular target, a small embedded disassembler might be practical.
>
> ## summary and conclusion
>
> at minimum, the metedata may be useful for x86 EVEX support. note
> I see `tests/tcg/i386/x86.csv` in the source tree. the metadata is
> also based on x86-csv but has had numerous inaccuracies fixed as
> well as conversion of legacy instructions to the new LEX format.
> in effect the metadata has been fuzz-tested against LLVM for x86-64
> and ISA coverage is in the order of ~99.7%. the main branch of the
> linked repo has a procedural fuzzer for metadata-based instruction
> synthesis that could be useful for generating test cases for QEMU.
>
> I am kind of throwing this over the fence, although the code is quite
> self-contained and my stress and mental health is now under control.
> also I have not yet run checkpatch.pl on this code. it is a preview.
>
> x86-mini submaintainer.
> Michael Clark.
> --
>
>

Reply via email to