Bugs item #2494730, was opened at 2009-01-09 09:59
Message generated for change (Comment added) made by kmshanah
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2494730&group_id=180599
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Kevin Shanahan (kmshanah)
Assigned to: Nobody/Anonymous (nobody)
Summary: Guests "stalling" on kvm-82
Initial Comment:
I am seeing periodic stalls in Linux and Windows guests with kvm-82 on an IBM
X3550 server with 2 x Xeon 5130 CPUs and 32GB RAM.
I am *reasonably* certain that this is a regression somewhere between kvm-72
and kvm-82. We had been running kvm-72 (actually, the debian kvm-source
package) up until now and never noticed the problem. Now the stalls are very
obvious. When the guest stalls, the at least one kvm process on the host
gobbles up 100% CPU. I'll do my debugging with the Linux guest, as that's sure
to be easier to deal with.
As a simple demostration that the guest is unresponsive, here is the result of
me pinging the guest from another machine on the (very quiet) LAN:
--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599659ms
rtt min/avg/max/mdev = 0.255/181.211/6291.871/558.706 ms, pipe 7
The latency varies pretty badly, with spikes up to several seconds as you can
see.
The problem is not reproducable on other VT capable hardware that I have - e.g.
my desktop has a E8400 CPU which runs the VMs just fine. Does knowing that make
it any easier to guess where the problem might be?
The Xeon 5130 does not have the "smx", "est", "sse4_1", "xsave", "vnmi" and
"flexpriority" CPU flags that the E8400 does.
Because this server is the only hardware I have which exhibits the problem and
it's a production machine, I have limited times where I can do testing.
However, I will try confirm that kvm-72 is okay and then bisect.
Currently the host is running a 2.6.28 kernel with the kvm-82 modules. I guess
I'm likely to have problems compiling the older kvm releases against this
kernel, so I'll have to drop back to 2.6.27.something to run the tests.
CPU Vendor: Intel
CPU Type: Xeon 5130
Number of CPUs: 2
Host distribution: Debain Lenny/Sid
KVM version: kvm-82
Host kernel: Linux 2.6.28 x86_64
Guest Distribution: Debian Etch
Guest kernel: Linux 2.6.27.10 i686
Host's /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
stepping : 6
cpu MHz : 1995.117
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2
ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips : 3990.23
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
stepping : 6
cpu MHz : 1995.117
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2
ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips : 3989.96
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
stepping : 6
cpu MHz : 1995.117
cache size : 4096 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 2
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2
ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips : 3990.01
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
stepping : 6
cpu MHz : 1995.117
cache size : 4096 KB
physical id : 3
siblings : 2
core id : 1
cpu cores : 2
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2
ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips : 3990.01
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
----------------------------------------------------------------------
>Comment By: Kevin Shanahan (kmshanah)
Date: 2009-01-09 22:12
Message:
I've done some more testing tonight and unfortunately it's not quite as
easy to reproduce as I had hoped. Here's what I got so far:
Test1 - Linux 2.6.27.10, kvm-82
$ ping -c 600 hermes-old
(with no other guests running)
--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599283ms
rtt min/avg/max/mdev = 0.211/0.407/7.145/0.345 ms
(three other guests)
--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599423ms
rtt min/avg/max/mdev = 0.194/0.497/6.364/0.290 ms
(all guests running)
--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599506ms
rtt min/avg/max/mdev = 0.254/0.577/22.347/0.932 ms
Test2 - Linux 2.6.28, kvm-82
(with no other guests running)
--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599367ms
rtt min/avg/max/mdev = 0.206/0.408/6.768/0.279 ms
(three other guests)
--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599347ms
rtt min/avg/max/mdev = 0.254/0.582/77.566/3.166 ms
(all guests running)
--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599681ms
rtt min/avg/max/mdev = 0.256/328.184/10335.007/1024.143 ms, pipe 11
So, the problem did surface again but only the one time. It could be a
2.6.28 specific thing, a bad interaction between the running guests,
something that triggers after a certain amount of guest/host uptime... who
knows. I need to find a more reliable way to trigger it before I will be
able to bisect. Any suggestions welcome.
----------------------------------------------------------------------
Comment By: Kevin Shanahan (kmshanah)
Date: 2009-01-09 11:17
Message:
I forgot to give the command line for the Linux guest:
/usr/local/kvm/bin/qemu-system-x86_64 \
-smp 2 \
-localtime -m 2048 \
-hda kvm-17-1.img \
-hdb kvm-17-tmp.img \
-net nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 \
-net tap,vlan=0,ifname=tap17,script=no \
-vnc 127.0.0.1:17 -usbdevice tablet \
-daemonize
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2494730&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html