Tejun Heo wrote:
Benjamin Herrenschmidt wrote:
--- Exception: 301 at .memset+0x60/0xfc
LR = .pcpu_alloc+0x718/0x8fc
So it's memsetting something that causes it to hash_page(), ie, faulting
in pages (vmalloc space ?) so far nothing obviously wrong....
It's probably memset() call near the end of pcpu_populate_chunk()
where percpu allocator clears the allocated areas before returning to
user. I don't think the first chunk is causing the problem as they're
all in the linear mapped area. From the second chunk on, they're on
vmalloc area and very near to the top of it, so that might be exposing
a hidden problem in paging code? BTW, for some reason, the problem is
not reproducible on my powerstation.
Sachin, can you please apply the attached patch on top of the current
linus tree, reproduce the hang and report full kernel log? Let's see
which address is causing the problem.
Here is the dmesg log captured with the debug patch.
Some of the debug messages related to PERCPU
<6>PERCPU: Embedded 2 pages/cpu @c000000001100000 s97160 r0 d33912 u524288
<6>pcpu-alloc: s97160 r0 d33912 u524288 alloc=1*1048576
<6>pcpu-alloc: [0] 0 1
<4>PERCPU: initialized 19 slots [c000000001120200,c000000001120330)
<4>PERCPU: chunk 0 relocating -1 -> 13 c000000001120380
<c000000001120380:c000000001120380>
<4>PERCPU: relocated <c0000000011202d0:c0000000011202d0>
<4>PERCPU: chunk 0 relocating 13 -> 12 c000000001120380
<c0000000011202d0:c0000000011202d0>
<4>PERCPU: relocated <c0000000011202c0:c0000000011202c0>
<4>PERCPU: chunk 0 relocating 12 -> 11 c000000001120380
<c0000000011202c0:c0000000011202c0>
<4>PERCPU: relocated <c0000000011202b0:c0000000011202b0>
<6>ehea: eth0: Physical port up
<6>ehea: External switch port is backup port
<7>irq: irq 33540 on host null mapped to virtual irq 260
<6>NET: Registered protocol family 10
<4>PERCPU: chunk 0 relocating 11 -> 10 c000000001120380
<c0000000011202b0:c0000000011202b0>
<4>PERCPU: relocated <c0000000011202a0:c0000000011202a0>
<4>PERCPU: chunk 0 relocating 10 -> 9 c000000001120380
<c0000000011202a0:c0000000011202a0>
<4>PERCPU: relocated <c000000001120290:c000000001120290>
<4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00
<c0000000db70fb00:c0000000db70fb00>
<4>PERCPU: relocated <c000000001120320:c000000001120320>
<4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00
<c000000001120320:c000000001120320>
<4>PERCPU: relocated <c000000001120300:c000000001120300>
<4>PERCPU: chunk 1, alloc pages [0,1)
<4>PERCPU: chunk 1, map pages [0,1)
<4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
<4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
<4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
<3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
Thanks
-Sachin
--
---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------
1:mon> c 0
0:mon> t
[link register ] c00000000016d0d0 .pcpu_alloc+0x8a4/0xae4
[c0000000da427960] c00000000016d094 .pcpu_alloc+0x868/0xae4 (unreliable)
[c0000000da427a90] c000000000613c08 .snmp_mib_init+0x34/0x9c
[c0000000da427b20] d00000000262e074 .ipv6_add_dev+0x1d4/0x3e4 [ipv6]
[c0000000da427bc0] d00000000265997c .addrconf_init+0x6c/0x194 [ipv6]
[c0000000da427c50] d00000000265974c .inet6_init+0x1bc/0x34c [ipv6]
[c0000000da427ce0] c0000000000097a4 .do_one_initcall+0x88/0x1bc
[c0000000da427d90] c0000000000d59fc .SyS_init_module+0x118/0x2a4
[c0000000da427e30] c0000000000085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000fff8350b568
SP (fffdc3a7f50) is in userspace
0:mon> dl
<4>Crash kernel location must be 0x2000000
<6>Reserving 256MB of memory at 32MB for crashkernel (System RAM: 4096MB)
<6>Phyp-dump disabled at boot time
<6>Using pSeries machine description
<7>Page orders: linear mapping = 24, virtual = 16, io = 12
<6>Using 1TB segments
<4>Found initrd at 0xc000000003500000:0xc000000003cca9b1
<6>bootconsole [udbg0] enabled
<6>Partition configured for 2 cpus.
<6>CPU maps initialized for 2 threads per core
<7> (thread shift is 1)
<4>Starting Linux PPC64 #2 SMP Fri Sep 25 12:31:24 IST 2009
<4>-----------------------------------------------------
<4>ppc64_pft_size = 0x1a
<4>physicalMemorySize = 0x100000000
<4>htab_hash_mask = 0x7ffff
<4>-----------------------------------------------------
<6>Initializing cgroup subsys cpuset
<6>Initializing cgroup subsys cpu
<5>Linux version 2.6.31-git15 (r...@mpower6lp5) (gcc version 4.3.2
[gcc-4_3-branch revision 141291] (SUSE Linux) ) #2 SMP Fri Sep 25 12:31:24 IST
2009
<4>[boot]0012 Setup Arch
<7>Node 0 Memory:
<7>Node 2 Memory: 0x0-0xe0000000
<7>Node 3 Memory: 0xe0000000-0x100000000
<4>EEH: No capable adapters found
<6>PPC64 nvram contains 15360 bytes
<7>Using shared processor idle loop
<4>Zone PFN ranges:
<4> DMA 0x00000000 -> 0x00010000
<4> Normal 0x00010000 -> 0x00010000
<4>Movable zone start PFN for each node
<4>early_node_map[2] active PFN ranges
<4> 2: 0x00000000 -> 0x0000e000
<4> 3: 0x0000e000 -> 0x00010000
<4>Could not find start_pfn for node 0
<7>On node 0 totalpages: 0
<7>On node 2 totalpages: 57344
<7> DMA zone: 56 pages used for memmap
<7> DMA zone: 0 pages reserved
<7> DMA zone: 57288 pages, LIFO batch:1
<7>On node 3 totalpages: 8192
<7> DMA zone: 8 pages used for memmap
<7> DMA zone: 0 pages reserved
<7> DMA zone: 8184 pages, LIFO batch:0
<4>[boot]0015 Setup Done
<6>PERCPU: Embedded 2 pages/cpu @c000000001100000 s97160 r0 d33912 u524288
<6>pcpu-alloc: s97160 r0 d33912 u524288 alloc=1*1048576
<6>pcpu-alloc: [0] 0 1
<4>PERCPU: initialized 19 slots [c000000001120200,c000000001120330)
<4>PERCPU: chunk 0 relocating -1 -> 13 c000000001120380
<c000000001120380:c000000001120380>
<4>PERCPU: relocated <c0000000011202d0:c0000000011202d0>
<4>Built 3 zonelists in Node order, mobility grouping on. Total pages: 65472
<4>Policy zone: DMA
<5>Kernel command line: root=/dev/sda3 sysrq=8 insmod=sym53c8xx insmod=ipr
crashkernel=512M-:256M xmon=on
<6>PID hash table entries: 4096 (order: -1, 32768 bytes)
<4>freeing bootmem node 2
<4>freeing bootmem node 3
<6>Memory: 3897152k/4194304k available (9664k kernel code, 297152k reserved,
2944k data, 4274k bss, 576k init)
<6>Hierarchical RCU implementation.
<6>RCU-based detection of stalled CPUs is enabled.
<6>NR_IRQS:512
<4>[boot]0020 XICS Init
<4>[boot]0021 XICS Done
<7>pic: no ISA interrupt controller
<7>time_init: decrementer frequency = 512.000000 MHz
<7>time_init: processor frequency = 4704.000000 MHz
<6>clocksource: timebase mult[7d0000] shift[22] registered
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
<4>Console: colour dummy device 80x25
<6>console [hvc0] enabled, bootconsole disabled
<6>allocated 2621440 bytes of page_cgroup
<6>please try 'cgroup_disable=memory' option if you don't want memory cgroups
<6>Security Framework initialized
<6>SELinux: Disabled at boot.
<6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes)
<6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes)
<4>Mount-cache hash table entries: 4096
<6>Initializing cgroup subsys ns
<6>Initializing cgroup subsys cpuacct
<6>Initializing cgroup subsys memory
<6>Initializing cgroup subsys devices
<6>Initializing cgroup subsys freezer
<7>irq: irq 2 on host null mapped to virtual irq 16
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<6>Brought up 2 CPUs
<7>Node 0 CPUs: 0-1
<7>Node 2 CPUs:
<7>Node 3 CPUs:
<7>CPU0 attaching sched-domain:
<7> domain 0: span 0-1 level SIBLING
<7> groups: 0 (cpu_power = 589) 1 (cpu_power = 589)
<7> domain 1: span 0-1 level CPU
<7> groups: 0-1 (cpu_power = 1178)
<7>CPU1 attaching sched-domain:
<7> domain 0: span 0-1 level SIBLING
<7> groups: 1 (cpu_power = 589) 0 (cpu_power = 589)
<7> domain 1: span 0-1 level CPU
<7> groups: 0-1 (cpu_power = 1178)
<6>NET: Registered protocol family 16
<6>IBM eBus Device Driver
<6>POWER6 performance monitor hardware support registered
<6>PCI: Probing PCI hardware
<7>PCI: Probing PCI hardware done
<4>bio: create slab <bio-0> at 0
<4>PERCPU: chunk 0 relocating 13 -> 12 c000000001120380
<c0000000011202d0:c0000000011202d0>
<4>PERCPU: relocated <c0000000011202c0:c0000000011202c0>
<6>vgaarb: loaded
<6>usbcore: registered new interface driver usbfs
<6>usbcore: registered new interface driver hub
<6>usbcore: registered new device driver usb
<6>Switching to clocksource timebase
<6>NET: Registered protocol family 2
<6>IP route cache hash table entries: 32768 (order: 2, 262144 bytes)
<6>TCP established hash table entries: 131072 (order: 5, 2097152 bytes)
<6>TCP bind hash table entries: 65536 (order: 5, 2097152 bytes)
<6>TCP: Hash tables configured (established 131072 bind 65536)
<6>TCP reno registered
<6>NET: Registered protocol family 1
<6>Unpacking initramfs...
<7>Switched to high resolution mode on CPU 0
<7>Switched to high resolution mode on CPU 1
<7>irq: irq 655360 on host null mapped to virtual irq 17
<7>irq: irq 655367 on host null mapped to virtual irq 18
<6>IOMMU table initialized, virtual merging enabled
<7>irq: irq 589825 on host null mapped to virtual irq 19
<7>RTAS daemon started
<6>audit: initializing netlink socket (disabled)
<5>type=2000 audit(1253862433.200:1): initialized
<6>Kprobe smoke test started
<6>Kprobe smoke test passed successfully
<6>HugeTLB registered 16 MB page size, pre-allocated 0 pages
<6>HugeTLB registered 16 GB page size, pre-allocated 0 pages
<5>VFS: Disk quotas dquot_6.5.2
<4>Dquot-cache hash table entries: 8192 (order 0, 65536 bytes)
<6>Btrfs loaded
<6>msgmni has been set to 7608
<4>PERCPU: chunk 0 relocating 12 -> 11 c000000001120380
<c0000000011202c0:c0000000011202c0>
<4>PERCPU: relocated <c0000000011202b0:c0000000011202b0>
<6>alg: No test for stdrng (krng)
<6>Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
<6>io scheduler noop registered
<6>io scheduler anticipatory registered
<6>io scheduler deadline registered
<6>io scheduler cfq registered (default)
<6>pci_hotplug: PCI Hot Plug PCI Core version: 0.5
<6>rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
<7>vio_register_driver: driver hvc_console registering
<7>HVSI: registered 0 devices
<6>Generic RTC Driver v1.07
<6>Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
<6>pmac_zilog: 0.6 (Benjamin Herrenschmidt <b...@kernel.crashing.org>)
<6>input: Macintosh mouse button emulation as /devices/virtual/input/input0
<6>Uniform Multi-Platform E-IDE driver
<6>ide-gd driver 1.18
<6>IBM eHEA ethernet device driver (Release EHEA_0102)
<7>irq: irq 590088 on host null mapped to virtual irq 264
<6>ehea: eth0: Jumbo frames are disabled
<6>ehea: eth0 -> logical port id #2
<6>ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
<6>ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
<6>mice: PS/2 mouse device common for all mice
<6>EDAC MC: Ver: 2.1.0 Sep 25 2009
<6>usbcore: registered new interface driver hiddev
<6>usbcore: registered new interface driver usbhid
<6>usbhid: v2.6:USB HID core driver
<6>TCP cubic registered
<6>NET: Registered protocol family 15
<4>registered taskstats version 1
<4>Freeing unused kernel memory: 576k freed
<6>SysRq : Changing Loglevel
<4>Loglevel set to 8
<5>SCSI subsystem initialized
<7>vio_register_driver: driver ibmvscsi registering
<6>ibmvscsi 30000007: SRP_VERSION: 16.a
<6>scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8
<6>ibmvscsi 30000007: partner initialization complete
<6>ibmvscsi 30000007: host srp version: 16.a, host partition VIO Server (1), OS
3, max io 1048576
<6>ibmvscsi 30000007: Client reserve enabled
<6>ibmvscsi 30000007: sent SRP login
<6>ibmvscsi 30000007: SRP_LOGIN succeeded
<5>scsi 0:0:1:0: Direct-Access AIX VDASD 0001 PQ: 0 ANSI: 3
<5>scsi 0:0:2:0: CD-ROM AIX VOPTA PQ: 0 ANSI: 4
<6>udevd version 128 started
<5>sd 0:0:1:0: [sda] 146800640 512-byte logical blocks: (75.1 GB/70.0 GiB)
<5>sd 0:0:1:0: [sda] Write Protect is off
<7>sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08
<5>sd 0:0:1:0: [sda] Cache data unavailable
<3>sd 0:0:1:0: [sda] Assuming drive cache: write through
<5>sd 0:0:1:0: [sda] Cache data unavailable
<3>sd 0:0:1:0: [sda] Assuming drive cache: write through
<6> sda: sda1 sda2 sda3
<5>sd 0:0:1:0: [sda] Cache data unavailable
<3>sd 0:0:1:0: [sda] Assuming drive cache: write through
<5>sd 0:0:1:0: [sda] Attached SCSI disk
<6>kjournald starting. Commit interval 5 seconds
<6>EXT3 FS on sda3, internal journal
<6>EXT3-fs: mounted filesystem with writeback data mode.
<6>udevd version 128 started
<5>sd 0:0:1:0: Attached scsi generic sg0 type 0
<5>scsi 0:0:2:0: Attached scsi generic sg1 type 5
<4>sr0: scsi-1 drive
<6>Uniform CD-ROM driver Revision: 3.20
<7>sr 0:0:2:0: Attached scsi CD-ROM sr0
<6>Adding 2096320k swap on /dev/sda2. Priority:-1 extents:1 across:2096320k
<6>device-mapper: uevent: version 1.0.3
<6>device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised:
dm-de...@redhat.com
<6>loop: module loaded
<6>fuse init (API version 7.13)
<7>irq: irq 33539 on host null mapped to virtual irq 259
<6>ehea: eth0: Physical port up
<6>ehea: External switch port is backup port
<7>irq: irq 33540 on host null mapped to virtual irq 260
<6>NET: Registered protocol family 10
<4>PERCPU: chunk 0 relocating 11 -> 10 c000000001120380
<c0000000011202b0:c0000000011202b0>
<4>PERCPU: relocated <c0000000011202a0:c0000000011202a0>
<4>PERCPU: chunk 0 relocating 10 -> 9 c000000001120380
<c0000000011202a0:c0000000011202a0>
<4>PERCPU: relocated <c000000001120290:c000000001120290>
<4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00
<c0000000db70fb00:c0000000db70fb00>
<4>PERCPU: relocated <c000000001120320:c000000001120320>
<4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00
<c000000001120320:c000000001120320>
<4>PERCPU: relocated <c000000001120300:c000000001120300>
<4>PERCPU: chunk 1, alloc pages [0,1)
<4>PERCPU: chunk 1, map pages [0,1)
<4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
<4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
<4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
<3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
<3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
<3>BUG: soft lockup - CPU#0 stuck for 61s! [modprobe:1864]
<4>Modules linked in: ipv6(+) fuse loop dm_mod sr_mod cdrom sg sd_mod
crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
<4>NIP: c000000000043140 LR: c00000000016d0d0 CTR: 0000000000000040
<4>REGS: c0000000da4276e0 TRAP: 0901 Not tainted (2.6.31-git15)
<4>MSR: 8000000000009032 <EE,ME,IR,DR> CR: 44224880 XER: 20000002
<4>TASK = c0000000dbbf71d0[1864] 'modprobe' THREAD: c0000000da424000 CPU: 0
<4>GPR00: 0000000000000040 c0000000da427960 c000000000b8bca8 d00007fffff00000
<4>GPR04: 0000000000000000 0000000000000000 d00007fffff00000 80000000565a6cc0
<4>GPR08: 0000000000000000 c000000001120180 c000000000c45aa0 00000000000003c0
<4>GPR12: 0000000028224882 c000000000c62600
<4>NIP [c000000000043140] .memset+0x60/0xfc
<4>LR [c00000000016d0d0] .pcpu_alloc+0x8a4/0xae4
<4>Call Trace:
<4>[c0000000da427960] [c00000000016d094] .pcpu_alloc+0x868/0xae4 (unreliable)
<4>[c0000000da427a90] [c000000000613c08] .snmp_mib_init+0x34/0x9c
<4>[c0000000da427b20] [d00000000262e074] .ipv6_add_dev+0x1d4/0x3e4 [ipv6]
<4>[c0000000da427bc0] [d00000000265997c] .addrconf_init+0x6c/0x194 [ipv6]
<4>[c0000000da427c50] [d00000000265974c] .inet6_init+0x1bc/0x34c [ipv6]
<4>[c0000000da427ce0] [c0000000000097a4] .do_one_initcall+0x88/0x1bc
<4>[c0000000da427d90] [c0000000000d59fc] .SyS_init_module+0x118/0x2a4
<4>[c0000000da427e30] [c0000000000085b4] syscall_exit+0x0/0x40
<4>Instruction dump:
<4>98860000 38c60001 409e000c b0860000 38c60002 409d000c 90860000 38c60004
<4>78a0d183 78a506a0 7c0903a6 4182002c <f8860000> f8860008 f8860010 f8860018
<3>INFO: RCU detected CPU 0 stall (t=7000 jiffies)
<3>INFO: RCU detected CPU 0 stall (t=10000 jiffies)
0:mon>
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev