On Fri, May 20, 2011 at 09:35:08PM +0200, Aurelien Jarno wrote: > On Fri, May 20, 2011 at 04:39:27PM +0400, Kirill Batuzov wrote: > > This series implements some basic machine-independent optimizations. They > > simplify code and allow liveness analysis do it's work better. > > > > Suppose we have following ARM code: > > > > movw r12, #0xb6db > > movt r12, #0xdb6d > > > > In TCG before optimizations we'll have: > > > > movi_i32 tmp8,$0xb6db > > mov_i32 r12,tmp8 > > mov_i32 tmp8,r12 > > ext16u_i32 tmp8,tmp8 > > movi_i32 tmp9,$0xdb6d0000 > > or_i32 tmp8,tmp8,tmp9 > > mov_i32 r12,tmp8 > > > > And after optimizations we'll have this: > > > > movi_i32 r12,$0xdb6db6db > > > > Here are performance evaluation results on SPEC CPU2000 integer tests in > > user-mode emulation on x86_64 host. There were 5 runs of each test on > > reference data set. The tables below show runtime in seconds for all these > > runs. > > How are the tests done? Are they done with linux-user, or running the > executables in qemu-system-xxx? >
Another point I forgot, the current TCG code already does a very simple copy and constant propagation, so it might be a good idea to end the series with cleanup this code. There is no point in doing such optimisation twice, and it is most probably taking a non negligible time. This way you can provide benchmarks on the whole set. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net