On 29.11.2011, at 08:03, 陳韋任 wrote: > Hi all, > > Our team are working on a project similar to llvm-qemu [1], which is also > based on QEMU and LLVM. Current status is the process mode works fine [2], and > we're moving forward to system mode. > > Let me briefly introduce our framework here and state what problem we > encounter. > What we do is translating TCG IR into LLVM IR and let LLVM JIT do the codegen. > In our framework, we have both TCG and LLVM codegen capacity. For > short-running > application, TCG's code quality is good enough; LLVM codegen is for > long-running > application on the other hand. We have two code cache in our framework, one is > the original QEMU code cache (for basic block) and the other is LLVM code > cache > (for trace). The concept of trace is the same as the "super-blocks" as > mentioned > in the discussion thread [3], which is composed of a set of basic blocks. The > goal is to enlarge the optimization scope and hope the code quality of trace > is > better than the basic block's. Here is the overview of our framework. > > > QEMU code cache LLVM code cache > (block) (trace) > > bb1 ------------> trace1 > > > In our framework, if we find a basic block (bb1) is hot enough (i.e., being > executed many times), we start building a trace (beginning with bb1) and let > LLVM do the codegen. We place the optimized code in the LLVM code cache, and > patch the head of bb1 so that anyone executing bb1 will jump to trace1 > directly. > Since we're moving toward system mode, we have to consider situations where > unlinking is needed. Block linking done by QEMU itself and we leave block > unlinking to it. The problem is when/where to break the link between block and > trace. I can only spot two places we should break the block -> trace link so > far [4]. I don't know if I spot them all or I miss something else. > > 1. cpu_unlink_tb (exec.c) > > 2. tb_phys_invalidate (exec.c)
Very cool! I was thinking about this for a while myself now. It's especially appealing these days since you can do the hotspot optimization in a separate thread :). Especially in system mode, you also need to flush when tb_flush() is called though. And you have to make sure to match hflags and segment descriptors for the links - otherwise you might end up connecting TBs from different processes :). > > The big problem is debugging. We test our system by using images downloaded > from > the website [5]. Basically, we want to see an operating system being booted For Linux, I can recommend these images: http://people.debian.org/~aurel32/qemu/ If you want to be more exotic (minix found a lot of bugs for me back in the day!) you can try the os zoo: http://www.oszoo.org/ > successfully, then login and run some benchmark on it. As a very first step, > we > make a very high threshold on trace building. In other words, a basic block > must > be executed *many* time to trigger the trace building process. Then we lower > the > threshold a bit at a time to see how things work. When something goes wrong, > we > might get kernel panic or the system hangs at some point on the booting > process. > I have no idea on how to solve this kind of problem. So I'd like to seek for > help/experience/suggestion on the mailing list. I just hope I make the whole > situation clear to you. I don't see any better approach to debugging this than the one you're already taking. Try to run as many workloads as you can and see if they break :). Oh and always make the optimization optional, so that you can narrow it down to it and know you didn't hit a generic QEMU bug. Alex