Hi, kqemu doesn't trap the "rdtsc" instruction for performance reasons. This is mostly okay on a uniprocessor host, but on a dual core CPU there are effectively two TSCs and there's no warranty that they are in sync. On my Linux desktop there happens to be about 17 seconds difference between them, with a 14 days uptime and cpufreq not compiled in. If the qemu guest is a linux using TSC for the clocksource (default in some configurations) it turns out to be fatal as the kernel believes it has only one processor and the TSC skews 17 seconds forward and back when qemu is migrated between the processors on host, resulting in guest lock-up. In fact Linux locks up whenever the TSC increment is negative a single time. The lock-up will occur after a random time from boot up.
I'm not sure what would be the best resolution. Here are two ideas, both have their downsides but they work: Avoid using "rdtsc" on host. This requires a change in kqemu to trap "rdtsc" (probably can be done in a smarter way so that full fall back to qemu is not necesary): --- a/common/interp.c +++ b/common/interp.c @@ -4641,17 +4641,7 @@ QO( case OT_LONG | 8:\ LABEL(90) /* nop */ goto insn_next; LABEL(131) /* rdtsc */ - { - uint32_t low, high; - if ((s->cpu_state.cr4 & CR4_TSD_MASK) && - s->cpu_state.cpl != 0) { - raise_exception_err(s, EXCP0D_GPF, 0); - } - asm volatile("rdtsc" : "=a" (low), "=d" (high)); - s->regs1.eax = low; - s->regs1.edx = high; - } - goto insn_next; + raise_exception(s, KQEMU_RET_SOFTMMU); LABEL(105) /* syscall */ helper_syscall(s); and a change in qemu: --- a/hw/pc.c +++ b/hw/pc.c @@ -61,17 +61,7 @@ static void ioportF0_write(void *opaque, uint32_t addr, uint3 2_t data) /* TSC handling */ uint64_t cpu_get_tsc(CPUX86State *env) { - /* Note: when using kqemu, it is more logical to return the host TSC - because kqemu does not trap the RDTSC instruction for - performance reasons */ -#if USE_KQEMU - if (env->kqemu_enabled) { - return cpu_get_real_ticks(); - } else -#endif - { - return cpu_get_ticks(); - } + return cpu_get_ticks(); } /* SMM support */ The downside here is the performance penalty. I haven't done any benchmarks but during my tests with Linux guest almost all TSC reads happened in qemu, rather than in kqemu so the overhead shouldn't be significant. Second idea is to prevent qemu migration between processors by setting the affinity (as it is already done for MsWindows hosts): --- a/vl.c +++ b/vl.c @@ -49,6 +49,7 @@ #endif #else #ifndef __sun__ +#define _Linux #include <linux/if.h> #include <linux/if_tun.h> #include <pty.h> @@ -56,6 +57,7 @@ #include <linux/rtc.h> #include <linux/ppdev.h> #include <linux/parport.h> +#include <sched.h> #else #include <sys/stat.h> #include <sys/ethernet.h> @@ -6884,11 +6886,25 @@ int main(int argc, char **argv) LIST_INIT (&vm_change_state_head); #ifndef _WIN32 { +#if defined(_Linux) && defined(USE_KQEMU) + cpu_set_t mask; + int i; +#endif struct sigaction act; sigfillset(&act.sa_mask); act.sa_flags = 0; act.sa_handler = SIG_IGN; sigaction(SIGPIPE, &act, NULL); +#if defined(_Linux) && defined(USE_KQEMU) + /* Force QEMU to run on a single CPU so that we can expect + * consistent values from "rdtsc" */ + if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == 0) { + for (i = 0; !CPU_ISSET(i, &mask); i ++); + CPU_ZERO(&mask); + CPU_SET(i, &mask); + sched_setaffinity(0, sizeof(cpu_set_t), &mask); + } +#endif } #else SetConsoleCtrlHandler(qemu_ctrl_handler, TRUE); This part can be moved to somewhere after kqemu is enabled so that it can be made conditional. This approach is Linux specific, and it forces all qemu instances to run on a single processor, so this can have an even bigger performance hit (imagine 8 qemu sessions on an 8 CPU host). It also doesn't avoid the use of "rdtsc" so the virtual TSC runs even when the emulator is stopped, and there's no way to implement writing to the TSC with "wrmsr". I don't know which of these workarounds is more appropriate. Anthony Liguori had an idea to use the AMD "rdtscp" instruction which in addition returns the cpu number, and maintain a list of TSC offsets for each host CPU to compensate for the differences between TSCs. I need to always use one of the two workarounds when booting the Xenoppix (vmknoppix) live-cd in qemu with kqemu enabled or I get a lock-up after a random period fro bootup. Thanks to #qemu channel for helping debugging this. Regards, Andrzej