Looks like the SCSI driver is causing problems. QEMU's SCSI emulation is known to be broken, please use IDE or virtio-blk.
Jes -- qemu-kvm 0.12.4+dfsg-1 from debian squeeze crashes "BUG: unable to handle kernel NULL pointer" (sym53c8xx) https://bugs.launchpad.net/bugs/587993 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: Incomplete Bug description: I use eucalyptus software (1.6.2) on debian squeeze with kvm 0.12.4+dfsg-1 (the same happend with 0.11.1+dfsg-1 ). Kernel 2.6.32-3-amd64. After a few days machines crash. There are no logs in host system. Guest is the same kernel and OS as host. The kvm process use 100% of cpu time. I can not even ping the guest. Everything works fine with 2.6.30-2-amd64 and 2.6.32-trunk-amd64. The problem is only with 2.6.32-3-amd64 and 2.6.32-5-amd64. Here is the log from virtual machine: [ 3577.816666] sd 0:0:0:0: [sda] ABORT operation started [ 3582.816047] sd 0:0:0:0: ABORT operation timed-out. [ 3582.816781] sd 0:0:0:0: [sda] ABORT operation started [ 3587.816649] sd 0:0:0:0: ABORT operation timed-out. [ 3587.817379] sd 0:0:0:0: [sda] DEVICE RESET operation started [ 3592.816062] sd 0:0:0:0: DEVICE RESET operation timed-out. [ 3592.816882] sd 0:0:0:0: [sda] BUS RESET operation started [ 3592.820056] sym0: SCSI BUS reset detected. [ 3592.831538] sym0: SCSI BUS has been reset. [ 3592.831968] BUG: unable to handle kernel NULL pointer dereference at 0000000000000358 [ 3592.832003] IP: [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx] [ 3592.832003] PGD 5f73e067 PUD 5fa53067 PMD 0 [ 3592.832003] Oops: 0000 [#1] SMP [ 3592.832003] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/host0/target0:0:0/0:0:0:0/vendor [ 3592.832003] CPU 0 [ 3592.832003] Modules linked in: dm_mod openafs(P) ext2 snd_pcsp snd_pcm snd_timer serio_raw i2c_piix4 snd virtio_balloon evdev i2c_core soundcore psmouse button processor snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif ata_generic libata ide_pci_generic sym53c8xx scsi_transport_spi thermal piix uhci_hcd ehci_hcd floppy thermal_sys scsi_mod virtio_pci virtio_ring virtio e1000 ide_core usbcore nls_base [last unloaded: scsi_wait_scan] [ 3592.832003] Pid: 193, comm: scsi_eh_0 Tainted: P 2.6.32-3-amd64 #1 Bochs [ 3592.832003] RIP: 0010:[<ffffffffa01147c4>] [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx] [ 3592.832003] RSP: 0018:ffff880001803cb0 EFLAGS: 00010287 [ 3592.832003] RAX: 000000000000000a RBX: 000000000000000b RCX: 000000005f410090 [ 3592.832003] RDX: 0000000000000000 RSI: ffff88005c450800 RDI: ffffc90000a5e006 [ 3592.832003] RBP: ffff88005f410000 R08: 0000000000000000 R09: 0000000000000000 [ 3592.832003] R10: 000000000000003a R11: ffffffff813b871e R12: ffff88005f410090 [ 3592.832003] R13: 0000000000000084 R14: 0000000000000000 R15: 0000000000000001 [ 3592.832003] FS: 0000000000000000(0000) GS:ffff880001800000(0000) knlGS:0000000000000000 [ 3592.832003] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 3592.832003] CR2: 0000000000000358 CR3: 000000005e269000 CR4: 00000000000006f0 [ 3592.832003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3592.832003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 3592.832003] Process scsi_eh_0 (pid: 193, threadinfo ffff88005f6fa000, task ffff88005f697880) [ 3592.832003] Stack: [ 3592.832003] ffff88005f3fd000 0000000000000000 0000000000000130 0000000000000000 [ 3592.832003] <0> ffff88005f407710 ffffc90000a64710 ffffffffffffff10 ffffffff81195301 [ 3592.832003] <0> 0000000000000010 0000000000010212 ffff880001803d18 0000000000000018 [ 3592.832003] Call Trace: [ 3592.832003] <IRQ> [ 3592.832003] [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19 [ 3592.832003] [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx] [ 3592.832003] [<ffffffff8103fea0>] ? update_curr+0xa6/0x147 [ 3592.832003] [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx] [ 3592.832003] [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126 [ 3592.832003] [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5 [ 3592.832003] [<ffffffff81013957>] ? handle_irq+0x17/0x1d [ 3592.832003] [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6 [ 3592.832003] [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11 [ 3592.832003] [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f [ 3592.832003] [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95 [ 3592.832003] [<ffffffff81011cac>] ? call_softirq+0x1c/0x30 [ 3592.832003] [<ffffffff81013903>] ? do_softirq+0x3f/0x7c [ 3592.832003] [<ffffffff810537e1>] ? irq_exit+0x36/0x76 [ 3592.832003] [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95 [ 3592.832003] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20 [ 3592.832003] <EOI> [ 3592.832003] [<ffffffff8118e009>] ? delay_tsc+0x0/0x73 [ 3592.832003] [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx] [ 3592.832003] [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod] [ 3592.832003] [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 [scsi_mod] [ 3592.832003] [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 [scsi_mod] [ 3592.832003] [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod] [ 3592.832003] [<ffffffff81064789>] ? kthread+0x79/0x81 [ 3592.832003] [<ffffffff81011baa>] ? child_rip+0xa/0x20 [ 3592.832003] [<ffffffff81064710>] ? kthread+0x0/0x81 [ 3592.832003] [<ffffffff81011ba0>] ? child_rip+0x0/0x20 [ 3592.832003] Code: 48 c7 c7 82 92 11 a0 eb 63 48 8b 98 38 01 00 00 48 8d b8 28 01 00 00 e8 df d5 0f e1 48 89 da 48 89 c6 48 c7 c7 bc 92 11 a0 eb 6d <49> 8b 96 58 03 00 00 48 8b 82 80 00 00 00 48 8b a8 b0 00 00 00 [ 3592.832003] RIP [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx] [ 3592.832003] RSP <ffff880001803cb0> [ 3592.832003] CR2: 0000000000000358 [ 3592.867935] ---[ end trace 06f90ebbdbd172ee ]--- [ 3592.868360] Kernel panic - not syncing: Fatal exception in interrupt [ 3592.868906] Pid: 193, comm: scsi_eh_0 Tainted: P D 2.6.32-3-amd64 #1 [ 3592.869511] Call Trace: [ 3592.869727] <IRQ> [<ffffffff812ed349>] ? panic+0x86/0x141 [ 3592.870225] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20 [ 3592.870778] [<ffffffff811afbdc>] ? dummycon_dummy+0x0/0x3 [ 3592.871250] [<ffffffff81014a37>] ? oops_end+0x64/0xb4 [ 3592.871694] [<ffffffff81014a7a>] ? oops_end+0xa7/0xb4 [ 3592.872150] [<ffffffff810322b8>] ? no_context+0x1e9/0x1f8 [ 3592.872626] [<ffffffff8103246d>] ? __bad_area_nosemaphore+0x1a6/0x1ca [ 3592.873185] [<ffffffff8106807c>] ? up+0xe/0x36 [ 3592.873576] [<ffffffff8104e219>] ? release_console_sem+0x17e/0x1af [ 3592.874125] [<ffffffff81024d72>] ? lapic_next_event+0x18/0x1d [ 3592.874642] [<ffffffff812ef595>] ? page_fault+0x25/0x30 [ 3592.875103] [<ffffffffa01147c4>] ? sym_int_sir+0x62f/0x14e0 [sym53c8xx] [ 3592.875678] [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19 [ 3592.876162] [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx] [ 3592.876748] [<ffffffff8103fea0>] ? update_curr+0xa6/0x147 [ 3592.877224] [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx] [ 3592.877800] [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126 [ 3592.878319] [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5 [ 3592.878848] [<ffffffff81013957>] ? handle_irq+0x17/0x1d [ 3592.879305] [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6 [ 3592.879744] [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11 [ 3592.880237] [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f [ 3592.880723] [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95 [ 3592.881284] [<ffffffff81011cac>] ? call_softirq+0x1c/0x30 [ 3592.881762] [<ffffffff81013903>] ? do_softirq+0x3f/0x7c [ 3592.882230] [<ffffffff810537e1>] ? irq_exit+0x36/0x76 [ 3592.882691] [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95 [ 3592.883258] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20 [ 3592.883795] <EOI> [<ffffffff8118e009>] ? delay_tsc+0x0/0x73 [ 3592.884319] [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx] [ 3592.884917] [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod] [ 3592.885522] [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 [scsi_mod] [ 3592.886152] [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 [scsi_mod] [ 3592.886789] [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod] [ 3592.887398] [<ffffffff81064789>] ? kthread+0x79/0x81 [ 3592.887836] [<ffffffff81011baa>] ? child_rip+0xa/0x20 [ 3592.888290] [<ffffffff81064710>] ? kthread+0x0/0x81 [ 3592.888721] [<ffffffff81011ba0>] ? child_rip+0x0/0x20 Unfortunatelly I have no idea how to reproduce the problem. Log from /var/log/libvirt/qemu/ lsi_scsi: error: Unimplemented message 0x0c What is more I had 7 vm running. Today four of them crashed at the same time. The rest survived with something like this in syslog: [651330.816043] sd 0:0:0:0: [sda] ABORT operation started [651335.860027] sd 0:0:0:0: ABORT operation timed-out. [651335.860600] sd 0:0:0:0: [sda] ABORT operation started [651337.019355] sd 0:0:0:0: ABORT operation complete. [651337.038506] sd 0:0:0:0: [sda] ABORT operation started [651337.039100] sd 0:0:0:0: ABORT operation failed. [651337.039624] sd 0:0:0:0: [sda] ABORT operation started [651337.040303] sd 0:0:0:0: ABORT operation failed. [651337.040834] sd 0:0:0:0: [sda] ABORT operation started [651337.041417] sd 0:0:0:0: ABORT operation failed. [651337.041949] sd 0:0:0:0: [sda] ABORT operation started [651337.042534] sd 0:0:0:0: ABORT operation failed. [651337.043072] sd 0:0:0:0: [sda] DEVICE RESET operation started [651337.043834] scsi target0:0:0: control msgout: c. [651337.520075] scsi target0:0:0: has been reset [651337.521726] sd 0:0:0:0: DEVICE RESET operation complete. [651337.522495] sd 0:0:0:0: M_REJECT received (0:0). It looks like the problem is in host system and has influence on all machines at the same time. I have found the same pattern in syslog on machines which crashed. It was 3 days before crash. There is no information in host log files at all. Is this possible that eucalyptus (1.6.2) caused this? With 1.6.1 I didin't have these problems. Eucalyptus runs kvm (0.12 and 0.11) with commands: /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/root USER=root LOGNAME=root /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name i-35B80630 -uuid 7e9b2fc1-9a9d-7114-3cb4-f4fdb3d51a3a -nographic -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/i-35B80630.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot c -kernel /var/lib/eucalyptus/instances/winnie/i-35B80630/kernel -initrd /var/lib/eucalyptus/instances/winnie/i-35B80630/ramdisk -append root=/dev/sda1 console=ttyS0 -device lsi,id=scsi0,bus=pci.0,addr=0x5 -drive file=/var/lib/eucalyptus/instances/winnie/i-35B80630/disk,if=none,id=drive-scsi0-0-0,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0 -device e1000,vlan=0,id=net0,mac=d0:0d:35:b8:06:30,bus=pci.0,addr=0x4 -net tap,fd=43,vlan=0,name=hostnet0 -chardev file,id=serial0,path=/var/lib/eucalyptus/instances/winnie/i-35B80630/console.log -device isa-serial,chardev=serial0 -usb -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 /usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 512 -smp 1 -name i-492407F3 -uuid b2dc266e-a62a-4e13-3847-f9104eba4135 -nographic -monitor unix:/var/lib/libvirt/qemu/i-492407F3.monitor,server,nowait -boot c -kernel /var/lib/eucalyptus/instances/admin/i-492407F3/kernel -initrd /var/lib/eucalyptus/instances/admin/i-492407F3/ramdisk -append root=/dev/sda1 console=ttyS0 -drive file=/var/lib/eucalyptus/instances/admin/i-492407F3/disk,if=scsi,bus=0,unit=0,boot=on -net nic,macaddr=d0:0d:49:24:07:f3,vlan=0,model=e1000,name=net0 -net tap,fd=118,vlan=0,name=hostnet0 -serial file:/var/lib/eucalyptus/instances/admin/i-492407F3/console.log -parallel none -usb -vga none -balloon virtio I can give the access to vm.