On 28 December 2011 00:43, Xin Tong <xerox.time.t...@gmail.com> wrote: > I modified QEMU to check for interrupt status at the end of every TB > and ran it on SPECINT2000 benchmarks with QEMU 0.15.0. The performance > is 70% of the unmodified one for some benchmarks on a x86_64 host. I > agree that the extra load-test-branch-not-taken per TB is minimal, but > what I found is that the average number of TB executed per TB enter is > low (~3.5 TBs), while the unmodified approach has ~10 TBs per TB > enter. this makes me wonder why. Maybe the mechanism i used to gather > this statistics is flawed. but the performance is indeed hindered.
Since you said you're using system mode, here's my guess. The unlink-tbs method of interrupting the guest CPU thread runs in a second thread (the io thread), and doesn't stop the guest CPU thread. So while the io thread is trying to unlink TBs, the CPU thread is still running on, and might well execute a few more TBs before the io thread's traversal of the TB graph catches up with it and manages to unlink the TB link the CPU thread is about to traverse. More generally: are we really taking an interrupt every 3 to 5 TBs? This seems very high -- surely we will be spending more time in the OS servicing interrupts than running useful guest userspace code... -- PMM