I've got several systems with similar hardware which crash with BUG: spinlock errors on async_umap_flush_lock such as:
BUG: spinlock lockup suspected on CPU#0, sh/1166 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23 BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23 (More examples below.) In general these happen very rarely--but a specific userland workload (lots of mongodb + sqlite reads & writes, while other CPUs are running compute-heavy tasks) seems to trigger it within a few minutes to hours. After 1-3 "spinlock lockup suspected" errors, the system locks up, no response to alt+sysrq. I've gotten the crash on one system in the last couple of days with 3.7.1-gentoo, 3.8.11-gentoo, 3.8.11 vanilla, and 3.4.4 vanilla. When I looked further back, over the past year another system crashed with similar errors (under similar workload) running 3.7.0-gentoo and 3.8.4-gentoo. Further back than that there are 2-3 crashes on those and other similar systems using 2.6.x and 3.0.x, but their errors are different enough that they may not be related. These systems each have: Supermicro X8DTU-F motherboard 2x Xeon E5645 (6 cores each + hyperthreading) 24 GB ECC RAM Adaptec 51645 RAID controller w/bbu 12x 2TB SAS disks They are using hw raid, 11 disks in a RAID6 with 1 hot-spare; main partition is 16 TB. They all use loop-aes v3.6g as a replacement loop.ko module to encrypt their / filesystem (using the aes-ni instruction set). 3.8.11 .config pastebin: http://pastebin.com/u3BDPTvP 3.4.44 .config pastebin: http://pastebin.com/1Rpk9RVf Generally speaking, 3.8.x and 3.4.44 kernels were compiled with GCC 4.7; the older 3.7.x kernels were compiled with GCC 4.6. Error messages, captured by serial consoles, newest crashes first: Host1: 3.4.44 BUG: spinlock lockup on CPU#0, john/21637 lock: ffffffff816558d0, .magic: dead4ead, .owner: mongod/27646, .owner_cpu: 8 BUG: spinlock lockup on CPU#6, mongod/3256 lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18 BUG: spinlock lockup on CPU#20, khugepaged/735 lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18 3.8.11 BUG: spinlock lockup suspected on CPU#0, sh/1166 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23 BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23 3.8.11-gentoo BUG: spinlock lockup suspected on CPU#0, swapper/0/0 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/3678, .owner_cpu: 4 BUG: spinlock lockup suspected on CPU#16, mongod/3115 lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5 BUG: spinlock lockup suspected on CPU#6, khugepaged/744 lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5 3.7.1-gentoo BUG: spinlock lockup suspected on CPU#0, john/32030 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13 BUG: spinlock lockup suspected on CPU#19, mongod/18985 lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2 BUG: spinlock lockup suspected on CPU#3, scsi_eh_0/1407 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13 BUG: spinlock lockup suspected on CPU#9, khugepaged/741 lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2 Host2: 3.8.4-gentoo BUG: spinlock lockup suspected on CPU#0, swapper/0/0 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/22377, .owner_cpu: 9 BUG: spinlock lockup suspected on CPU#4, mongod/3377 lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14 BUG: spinlock lockup suspected on CPU#21, mongod/3375 lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14 3.7.0-gentoo BUG: spinlock lockup suspected on CPU#0, swapper/0/0 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongo/16561, .owner_cpu: 3 (The repeated crashes on Host2 lead to irreperable ext4 corruption.) I can provide System.map files if they are interesting. I'd be happy to try a specific kernel, add patches to harvest more information in the event of a crash, etc. Thanks, -- Hank Leininger <hl...@marc.info> 3C2A 4EEE ED36 D136 18F2 1B30 47A8 D14B E13E 9C6A
signature.asc
Description: Digital signature