[Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques

陳韋任 Mon, 28 Nov 2011 23:04:16 -0800

Hi all,

  Our team are working on a project similar to llvm-qemu [1], which is also
based on QEMU and LLVM. Current status is the process mode works fine [2], and
we're moving forward to system mode.


Let me briefly introduce our framework here and state what problem we encounter.
What we do is translating TCG IR into LLVM IR and let LLVM JIT do the codegen.
In our framework, we have both TCG and LLVM codegen capacity. For short-running
application, TCG's code quality is good enough; LLVM codegen is for long-running
application on the other hand. We have two code cache in our framework, one is
the original QEMU code cache (for basic block) and the other is LLVM code cache
(for trace). The concept of trace is the same as the "super-blocks" as mentioned
in the discussion thread [3], which is composed of a set of basic blocks. The
goal is to enlarge the optimization scope and hope the code quality of trace is 
better than the basic block's. Here is the overview of our framework.


    QEMU code cache    LLVM code cache
        (block)            (trace)

          bb1 ------------> trace1  


In our framework, if we find a basic block (bb1) is hot enough (i.e., being
executed many times), we start building a trace (beginning with bb1) and let
LLVM do the codegen. We place the optimized code in the LLVM code cache, and
patch the head of bb1 so that anyone executing bb1 will jump to trace1 directly.
Since we're moving toward system mode, we have to consider situations where
unlinking is needed. Block linking done by QEMU itself and we leave block
unlinking to it. The problem is when/where to break the link between block and
trace. I can only spot two places we should break the block -> trace link so
far [4]. I don't know if I spot them all or I miss something else.

  1. cpu_unlink_tb (exec.c)

  2. tb_phys_invalidate (exec.c)

The big problem is debugging. We test our system by using images downloaded from
the website [5]. Basically, we want to see an operating system being booted
successfully, then login and run some benchmark on it. As a very first step, we
make a very high threshold on trace building. In other words, a basic block must
be executed *many* time to trigger the trace building process. Then we lower the
threshold a bit at a time to see how things work. When something goes wrong, we
might get kernel panic or the system hangs at some point on the booting process.
I have no idea on how to solve this kind of problem. So I'd like to seek for
help/experience/suggestion on the mailing list. I just hope I make the whole
situation clear to you. 

  Thanks!

[1] http://code.google.com/p/llvm-qemu/
[2] I have to admit we only test our framework with SPEC2006 benchmark,
    not with _real_ applications.
[3] http://lists.cs.uiuc.edu/pipermail/llvmdev/2008-April/013689.html
[4] http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg03643.html
[5] http://wiki.qemu.org/Download

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

[Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques

Reply via email to