On 14.06.2012, at 19:13, "Richard W.M. Jones" <rjo...@redhat.com> wrote:
> On Thu, Jun 14, 2012 at 05:58:04PM +0200, Alexander Graf wrote: >> [CC'ing qemu-ppc] >> >> On 06/14/2012 05:52 PM, Richard W.M. Jones wrote: >>> I found last week that qemu-system-ppc64 (from git) hangs occasionally >>> under load, and I have a reproducer for it now. Unfortunately the >>> reproducer really takes a long time to run -- usually I can get a hang >>> in under 12 hours. >>> >>> Here is the reproducer case: >>> >>> https://lists.fedoraproject.org/pipermail/ppc/2012-June/001698.html >>> >>> Notes: >>> >>> (1) Verified by one other person (other than me). Happens on both >>> ppc64 and x86-64 host. >>> >>> (2) Happens with both Fedora guest kernel 3.3.4-5.fc17.ppc64 and kernel >>> 3.5.0 that I compiled myself. The test case above contains 3.3.4-5. >>> >>> (3) Seems to be a problem in qemu, not the guest. The reason I think >>> this is because I tried to capture a backtrace of the hang using >>> remote gdb, but gdb just hung when trying to connect to qemu >>> (gdb connects fine before the bug happens). >>> >>> (4) Judging by guest messages, appears to be happening when writing >>> to the disk. >> >> Can you please try to see if you can repdudice this using vscsi / >> vio instead of virtio? I couldn't quite see why vio would be any >> more stable than virtio though ... > > I just tried virtio-scsi, but only the first disk shows up. I added > two disks. See below for detailed logs. This works fine on x86-64. > Should I file a separate bug for this? > >> Also, could you please try and see if it works reliably using KVM? >> Maybe we're just encountering some TCG breakage here. > > I will try this, but as discussed on IRC last week there's some > problem with the Fedora host kernel where /dev/kvm doesn't show up, > even though the kernel is supposedly compiled with KVM PR enabled. So > I need to fix that first. > > Rich. > > virtio scsi on ppc64 > -------------------- > > qemu command line: > > /home/rjones/d/qemu/ppc64-softmmu/qemu-system-ppc64 \ > -global virtio-blk-pci.scsi=off \ > -nodefconfig \ > -nodefaults \ > -nographic \ > -device virtio-scsi-pci,id=scsi \ > -drive file=test1.img,cache=off,format=raw,id=hd0,if=none \ > -device scsi-hd,drive=hd0 \ Don't you have to specify bus= too? Alex > -drive > file=/home/rjones/d/libguestfs/.guestfs-1000/root.26645,snapshot=on,id=appliance,if=none,cache=unsafe > \ > -device scsi-hd,drive=appliance \ > -M pseries \ > -enable-kvm \ > -machine accel=kvm:tcg \ > -m 500 \ > -no-reboot \ > -device virtio-serial \ > -serial stdio \ > -chardev > socket,path=/home/rjones/d/libguestfs/libguestfscoRCTO/guestfsd.sock,id=channel0 > \ > -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \ > -kernel /home/rjones/d/libguestfs/.guestfs-1000/kernel.26645 \ > -initrd /home/rjones/d/libguestfs/.guestfs-1000/initrd.26645 \ > -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off > printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 > TERM=screen ' > > guest kernel output: > > Welcome to Open Firmware > > Copyright (c) 2004, 2011 IBM Corporation All rights reserved. > This program and the accompanying materials are made available > under the terms of the BSD License available at > http://www.opensource.org/licenses/bsd-license.php > > Booting from memory... > OF stdout device is: /vdevice/vty@1000 > Preparing to boot Linux version 3.3.4-5.fc17.ppc64 > (mockbu...@ppc-builder2.qa.fedoraproject.org) (gcc version 4.7.0 20120504 > (Red Hat 4.7.0-4) (GCC) ) #1 SMP Mon May 14 10:18:37 MST 2012 > Detected machine type: 0000000000000101 > Max number of cores passed to firmware: 1024 (NR_CPUS = 1024) > Calling ibm,client-architecture-support... not implemented > couldn't open /packages/elf-loader > command line: panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off > printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 > TERM=screen > memory layout at init: > memory_limit : 0000000000000000 (16 MB aligned) > alloc_bottom : 0000000001a50000 > alloc_top : 000000001f400000 > alloc_top_hi : 000000001f400000 > rmo_top : 000000001f400000 > ram_top : 000000001f400000 > instantiating rtas at 0x000000001cff0000... done > Querying for OPAL presence... not there. > boot cpu hw idx 0 > copying OF device tree... > Building dt strings... > Building dt structure... > Device tree strings 0x0000000001c60000 -> 0x0000000001c605e0 > Device tree struct 0x0000000001c70000 -> 0x0000000001c80000 > Calling quiesce... > returning from prom_init > [ 0.000000] Phyp-dump not supported on this hardware > [ 0.000000] Using pSeries machine description > [ 0.000000] Using 1TB segments > [ 0.000000] Found initrd at 0xc000000001a50000:0xc000000001b7c400 > [ 0.000000] bootconsole [udbg0] enabled > [ 0.000000] CPU maps initialized for 1 thread per core > [ 0.000000] Starting Linux PPC64 #1 SMP Mon May 14 10:18:37 MST 2012 > [ 0.000000] ----------------------------------------------------- > [ 0.000000] ppc64_pft_size = 0x18 > [ 0.000000] physicalMemorySize = 0x1f400000 > [ 0.000000] htab_hash_mask = 0x1ffff > [ 0.000000] ----------------------------------------------------- > [ 0.000000] Initializing cgroup subsys cpuset > [ 0.000000] Initializing cgroup subsys cpu > [ 0.000000] Linux version 3.3.4-5.fc17.ppc64 > (mockbu...@ppc-builder2.qa.fedoraproject.org) (gcc version 4.7.0 20120504 > (Red Hat 4.7.0-4) (GCC) ) #1 SMP Mon May 14 10:18:37 MST 2012 > > CF000012 > Setup Arch[ 0.000000] [boot]0012 Setup Arch > [ 0.000000] PCI host bridge /pci@800000020000001,0 ranges: > [ 0.000000] IO 0x0000010080000000..0x000001008000ffff -> > 0x0000000000000000 > [ 0.000000] MEM 0x00000100a0000000..0x00000100bfffffff -> > 0x0000000080000000 > [ 0.000000] Zone PFN ranges: > [ 0.000000] DMA 0x00000000 -> 0x00001f40 > [ 0.000000] Normal empty > [ 0.000000] Movable zone start PFN for each node > [ 0.000000] Early memory PFN ranges > [ 0.000000] 0: 0x00000000 -> 0x00001f40 > > CF000015 > Setup Done[ 0.000000] [boot]0015 Setup Done > [ 0.000000] PERCPU: Embedded 2 pages/cpu @c000000001d00000 s84608 r0 > d46464 u1048576 > [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total > pages: 7993 > [ 0.000000] Policy zone: DMA > [ 0.000000] Kernel command line: panic=1 console=ttyS0 udevtimeout=600 > no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb > selinux=0 guestfs_verbose=1 TERM=screen > [ 0.000000] Disabling memory control group subsystem > [ 0.000000] PID hash table entries: 2048 (order: -2, 16384 bytes) > [ 0.000000] freeing bootmem node 0 > [ 0.000000] Memory: 486336k/512000k available (17920k kernel code, 25664k > reserved, 1856k data, 2952k bss, 6656k init) > [ 0.000000] SLUB: Genslabs=19, HWalign=128, Order=0-3, MinObjects=0, > CPUs=1, Nodes=256 > [ 0.000000] Hierarchical RCU implementation. > [ 0.000000] NR_IRQS:512 nr_irqs:512 16 > [ 0.000000] clocksource: timebase mult[1f40000] shift[24] registered > [ 0.000000] Console: colour dummy device 80x25 > [ 0.000000] Phyp-dump not supported on this hardware > [ 0.000000] Using pSeries machine description > [ 0.000000] Using 1TB segments > [ 0.000000] Found initrd at 0xc000000001a50000:0xc000000001b7c400 > [ 0.000000] bootconsole [udbg0] enabled > [ 0.000000] CPU maps initialized for 1 thread per core > [ 0.000000] Starting Linux PPC64 #1 SMP Mon May 14 10:18:37 MST 2012 > [ 0.000000] ----------------------------------------------------- > [ 0.000000] ppc64_pft_size = 0x18 > [ 0.000000] physicalMemorySize = 0x1f400000 > [ 0.000000] htab_hash_mask = 0x1ffff > [ 0.000000] ----------------------------------------------------- > [ 0.000000] Initializing cgroup subsys cpuset > [ 0.000000] Initializing cgroup subsys cpu > [ 0.000000] Linux version 3.3.4-5.fc17.ppc64 > (mockbu...@ppc-builder2.qa.fedoraproject.org) (gcc version 4.7.0 20120504 > (Red Hat 4.7.0-4) (GCC) ) #1 SMP Mon May 14 10:18:37 MST 2012 > [ 0.000000] [boot]0012 Setup Arch > [ 0.000000] PCI host bridge /pci@800000020000001,0 ranges: > [ 0.000000] IO 0x0000010080000000..0x000001008000ffff -> > 0x0000000000000000 > [ 0.000000] MEM 0x00000100a0000000..0x00000100bfffffff -> > 0x0000000080000000 > [ 0.000000] Zone PFN ranges: > [ 0.000000] DMA 0x00000000 -> 0x00001f40 > [ 0.000000] Normal empty > [ 0.000000] Movable zone start PFN for each node > [ 0.000000] Early memory PFN ranges > [ 0.000000] 0: 0x00000000 -> 0x00001f40 > [ 0.000000] [boot]0015 Setup Done > [ 0.000000] PERCPU: Embedded 2 pages/cpu @c000000001d00000 s84608 r0 > d46464 u1048576 > [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total > pages: 7993 > [ 0.000000] Policy zone: DMA > [ 0.000000] Kernel command line: panic=1 console=ttyS0 udevtimeout=600 > no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb > selinux=0 guestfs_verbose=1 TERM=screen > [ 0.000000] Disabling memory control group subsystem > [ 0.000000] PID hash table entries: 2048 (order: -2, 16384 bytes) > [ 0.000000] freeing bootmem node 0 > [ 0.000000] Memory: 486336k/512000k available (17920k kernel code, 25664k > reserved, 1856k data, 2952k bss, 6656k init) > [ 0.000000] SLUB: Genslabs=19, HWalign=128, Order=0-3, MinObjects=0, > CPUs=1, Nodes=256 > [ 0.000000] Hierarchical RCU implementation. > [ 0.000000] NR_IRQS:512 nr_irqs:512 16 > [ 0.000000] clocksource: timebase mult[1f40000] shift[24] registered > [ 0.000000] Console: colour dummy device 80x25 > [ 0.000000] console [hvc0] enabled > [ 0.000000] console [hvc0] enabled > [ 0.041700] pid_max: default: 32768 minimum: 301 > [ 0.041700] pid_max: default: 32768 minimum: 301 > [ 0.048107] Security Framework initialized > [ 0.048107] Security Framework initialized > [ 0.067154] SELinux: Disabled at boot. > [ 0.067154] SELinux: Disabled at boot. > [ 0.084262] Dentry cache hash table entries: 65536 (order: 3, 524288 bytes) > [ 0.084262] Dentry cache hash table entries: 65536 (order: 3, 524288 bytes) > [ 0.099618] Inode-cache hash table entries: 32768 (order: 2, 262144 bytes) > [ 0.099618] Inode-cache hash table entries: 32768 (order: 2, 262144 bytes) > [ 0.107083] Mount-cache hash table entries: 4096 > [ 0.107083] Mount-cache hash table entries: 4096 > [ 0.155933] Initializing cgroup subsys cpuacct > [ 0.155933] Initializing cgroup subsys cpuacct > [ 0.156562] Initializing cgroup subsys memory > [ 0.156562] Initializing cgroup subsys memory > [ 0.161423] Initializing cgroup subsys devices > [ 0.161423] Initializing cgroup subsys devices > [ 0.162250] Initializing cgroup subsys freezer > [ 0.162250] Initializing cgroup subsys freezer > [ 0.162992] Initializing cgroup subsys net_cls > [ 0.162992] Initializing cgroup subsys net_cls > [ 0.163913] Initializing cgroup subsys blkio > [ 0.163913] Initializing cgroup subsys blkio > [ 0.164843] Initializing cgroup subsys perf_event > [ 0.164843] Initializing cgroup subsys perf_event > [ 0.169308] ftrace: allocating 21118 entries in 8 pages > [ 0.169308] ftrace: allocating 21118 entries in 8 pages > [ 0.439808] POWER7 performance monitor hardware support registered > [ 0.439808] POWER7 performance monitor hardware support registered > [ 0.476013] Brought up 1 CPUs > [ 0.476013] Brought up 1 CPUs > [ 0.481103] Enabling Asymmetric SMT scheduling > [ 0.481103] Enabling Asymmetric SMT scheduling > [ 0.552049] devtmpfs: initialized > [ 0.552049] devtmpfs: initialized > [ 0.673170] atomic64 test passed > [ 0.673170] atomic64 test passed > [ 0.680501] NET: Registered protocol family 16 > [ 0.680501] NET: Registered protocol family 16 > [ 0.686950] IBM eBus Device Driver > [ 0.686950] IBM eBus Device Driver > [ 0.713306] nvram: No room to create ibm,rtas-log partition, deleting any > obsolete OS partitions... > [ 0.713306] nvram: No room to create ibm,rtas-log partition, deleting any > obsolete OS partitions... > [ 0.714363] nvram: Failed to find or create ibm,rtas-log partition, err -28 > [ 0.714363] nvram: Failed to find or create ibm,rtas-log partition, err -28 > [ 0.715042] nvram: No room to create lnx,oops-log partition, deleting any > obsolete OS partitions... > [ 0.715042] nvram: No room to create lnx,oops-log partition, deleting any > obsolete OS partitions... > [ 0.715559] nvram: Failed to find or create lnx,oops-log partition, err -28 > [ 0.715559] nvram: Failed to find or create lnx,oops-log partition, err -28 > > Linux ppc64 > #1 SMP Mon May 1[ 0.720031] CPU Hotplug not supported by firmware - > disabling. > [ 0.720031] CPU Hotplug not supported by firmware - disabling. > [ 0.740887] PCI: Probing PCI hardware > [ 0.740887] PCI: Probing PCI hardware > [ 0.749913] PCI host bridge to bus 0000:00 > [ 0.749913] PCI host bridge to bus 0000:00 > [ 0.751921] pci_bus 0000:00: root bus resource [io 0x10000-0x1ffff] > [ 0.751921] pci_bus 0000:00: root bus resource [io 0x10000-0x1ffff] > [ 0.752932] pci_bus 0000:00: root bus resource [mem > 0x100a0000000-0x100bfffffff] > [ 0.752932] pci_bus 0000:00: root bus resource [mem > 0x100a0000000-0x100bfffffff] > [ 0.765676] pci_dma_dev_setup_pSeriesLP: no DMA window found for pci > dev=0000:00:00.0 dn=/pci@800000020000001,0/scsi@0 > [ 0.765676] pci_dma_dev_setup_pSeriesLP: no DMA window found for pci > dev=0000:00:00.0 dn=/pci@800000020000001,0/scsi@0 > [ 0.773227] pci_dma_dev_setup_pSeriesLP: no DMA window found for pci > dev=0000:00:01.0 dn=/pci@800000020000001,0/communication-controller@1 > [ 0.773227] pci_dma_dev_setup_pSeriesLP: no DMA window found for pci > dev=0000:00:01.0 dn=/pci@800000020000001,0/communication-controller@1 > [ 0.787177] opal: Node not found > [ 0.787177] opal: Node not found > [ 0.831635] bio: create slab <bio-0> at 0 > [ 0.831635] bio: create slab <bio-0> at 0 > [ 0.854552] vgaarb: loaded > [ 0.854552] vgaarb: loaded > [ 0.861796] SCSI subsystem initialized > [ 0.861796] SCSI subsystem initialized > [ 0.873008] usbcore: registered new interface driver usbfs > [ 0.873008] usbcore: registered new interface driver usbfs > [ 0.874925] usbcore: registered new interface driver hub > [ 0.874925] usbcore: registered new interface driver hub > [ 0.877584] usbcore: registered new device driver usb > [ 0.877584] usbcore: registered new device driver usb > [ 0.915016] NetLabel: Initializing > [ 0.915016] NetLabel: Initializing > [ 0.915419] NetLabel: domain hash size = 128 > [ 0.915419] NetLabel: domain hash size = 128 > [ 0.915688] NetLabel: protocols = UNLABELED CIPSOv4 > [ 0.915688] NetLabel: protocols = UNLABELED CIPSOv4 > [ 0.921383] NetLabel: unlabeled traffic allowed by default > [ 0.921383] NetLabel: unlabeled traffic allowed by default > [ 0.923702] Switching to clocksource timebase > [ 0.923702] Switching to clocksource timebase > [ 1.354987] NET: Registered protocol family 2 > [ 1.354987] NET: Registered protocol family 2 > [ 1.366159] IP route cache hash table entries: 8192 (order: 0, 65536 bytes) > [ 1.366159] IP route cache hash table entries: 8192 (order: 0, 65536 bytes) > [ 1.385317] TCP established hash table entries: 16384 (order: 2, 262144 > bytes) > [