Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques

Alexander Graf Wed, 30 Nov 2011 04:38:13 -0800

On 29.11.2011, at 08:03, 陳韋任 wrote:

> Hi all,
> 
>  Our team are working on a project similar to llvm-qemu [1], which is also
> based on QEMU and LLVM. Current status is the process mode works fine [2], and
> we're moving forward to system mode.
> 
> Let me briefly introduce our framework here and state what problem we 
> encounter.
> What we do is translating TCG IR into LLVM IR and let LLVM JIT do the codegen.
> In our framework, we have both TCG and LLVM codegen capacity. For 
> short-running
> application, TCG's code quality is good enough; LLVM codegen is for 
> long-running
> application on the other hand. We have two code cache in our framework, one is
> the original QEMU code cache (for basic block) and the other is LLVM code 
> cache
> (for trace). The concept of trace is the same as the "super-blocks" as 
> mentioned
> in the discussion thread [3], which is composed of a set of basic blocks. The
> goal is to enlarge the optimization scope and hope the code quality of trace 
> is 
> better than the basic block's. Here is the overview of our framework.
> 
> 
>    QEMU code cache    LLVM code cache
>        (block)            (trace)
> 
>          bb1 ------------> trace1  
> 
> 
> In our framework, if we find a basic block (bb1) is hot enough (i.e., being
> executed many times), we start building a trace (beginning with bb1) and let
> LLVM do the codegen. We place the optimized code in the LLVM code cache, and
> patch the head of bb1 so that anyone executing bb1 will jump to trace1 
> directly.
> Since we're moving toward system mode, we have to consider situations where
> unlinking is needed. Block linking done by QEMU itself and we leave block
> unlinking to it. The problem is when/where to break the link between block and
> trace. I can only spot two places we should break the block -> trace link so
> far [4]. I don't know if I spot them all or I miss something else.
> 
>  1. cpu_unlink_tb (exec.c)
> 
>  2. tb_phys_invalidate (exec.c)


Very cool! I was thinking about this for a while myself now. It's especially 
appealing these days since you can do the hotspot optimization in a separate 
thread :).

Especially in system mode, you also need to flush when tb_flush() is called 
though. And you have to make sure to match hflags and segment descriptors for 
the links - otherwise you might end up connecting TBs from different processes 
:).

> 
> The big problem is debugging. We test our system by using images downloaded 
> from
> the website [5]. Basically, we want to see an operating system being booted

For Linux, I can recommend these images:

  http://people.debian.org/~aurel32/qemu/

If you want to be more exotic (minix found a lot of bugs for me back in the 
day!) you can try the os zoo:

  http://www.oszoo.org/

> successfully, then login and run some benchmark on it. As a very first step, 
> we
> make a very high threshold on trace building. In other words, a basic block 
> must
> be executed *many* time to trigger the trace building process. Then we lower 
> the
> threshold a bit at a time to see how things work. When something goes wrong, 
> we
> might get kernel panic or the system hangs at some point on the booting 
> process.
> I have no idea on how to solve this kind of problem. So I'd like to seek for
> help/experience/suggestion on the mailing list. I just hope I make the whole
> situation clear to you. 

I don't see any better approach to debugging this than the one you're already 
taking. Try to run as many workloads as you can and see if they break :). Oh 
and always make the optimization optional, so that you can narrow it down to it 
and know you didn't hit a generic QEMU bug.


Alex

Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques

Reply via email to