On Tue, Jul 04, 2017 at 04:26:11PM +1000, Michael Ellerman wrote: > Eryu Guan <eg...@redhat.com> writes: > > On Fri, Jun 30, 2017 at 08:07:02PM +1000, Michael Ellerman wrote: > >> > >> Can you try this patch and see if it changes anything? (with the debug > >> still applied). > > > > This patch fixes the crash for me. After appliying this patch (with all > > other debug patches still applied), kernel didn't print any warnings or > > calltraces or debug messages. > > OK. It's not meant to fix it :)
Understand. > > I can't form any connection between your bisection result and that > patch, nothing is making any sense TBH. > > What hardware are you on? And are you doing CPU hotplug or anything like that? It's a "PowerVM" guest (I'm not familiar with powerpc, I don't know what does that mean..) running on Power8 host. I didn't do any CPU hotplug or anything like that. lscpu output: Architecture: ppc64le Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 2 NUMA node(s): 3 Model: 2.1 (pvr 004b 0201) Model name: POWER8 (architected), altivec supported Hypervisor vendor: (null) Virtualization type: full L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-7 NUMA node2 CPU(s): 8-15 NUMA node3 CPU(s): > > Can you back out the last patch I sent and try this? I appended the calltraces from the test here, I also attached full dmesg log, which included the boot log. [ 74.410871] ------------[ cut here ]------------ [ 74.410895] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:3346 alloc_unbound_pwq+0x320/0x690 [ 74.410901] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp [ 74.410949] CPU: 0 PID: 2378 Comm: mount Not tainted 4.12.0.debug+ #35 [ 74.410954] task: c0000003f0447280 task.stack: c0000003f039c000 [ 74.410959] NIP: c00000000011a310 LR: c00000000011a300 CTR: c00000000011a1e4 [ 74.410963] REGS: c0000003f039f550 TRAP: 0700 Not tainted (4.12.0.debug+) [ 74.410968] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> [ 74.410993] CR: 24028888 XER: 00000001 [ 74.410998] CFAR: c000000000581584 SOFTE: 1 [ 74.410998] GPR00: c00000000011a590 c0000003f039f7d0 c000000001751800 0000000000000001 [ 74.410998] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000 [ 74.410998] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000030 [ 74.410998] GPR12: 0000000000000001 c00000000fac0000 0000000000000002 c0000003fd237000 [ 74.410998] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002 [ 74.410998] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294 [ 74.410998] GPR24: c0000003cb7ac400 c0000003f02349c0 00000000000000a0 c0000003f0234a00 [ 74.410998] GPR28: 000000006ca6897b c0000003cb7ac400 c00000000179a294 0000000000000000 [ 74.411082] NIP [c00000000011a310] alloc_unbound_pwq+0x320/0x690 [ 74.411087] LR [c00000000011a300] alloc_unbound_pwq+0x310/0x690 [ 74.411091] Call Trace: [ 74.411095] [c0000003f039f7d0] [c00000000011a590] alloc_unbound_pwq+0x5a0/0x690 (unreliable) [ 74.411103] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340 [ 74.411113] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0 [ 74.411120] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90 [ 74.411127] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0 [ 74.411145] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4] [ 74.411152] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260 [ 74.411168] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4] [ 74.411174] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210 [ 74.411181] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220 [ 74.411188] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70 [ 74.411194] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100 [ 74.411201] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0 [ 74.411206] Instruction dump: [ 74.411211] 554ac03e 7f8ae050 7b9c0020 2fac0000 409e0290 7f44d378 38a00000 484672cd [ 74.411227] 60000000 7c63d278 7c630074 7863d182 <0b030000> 3ca061c8 3f42001e 60a58647 [ 74.411243] ---[ end trace b720011b125c3341 ]--- [ 74.411253] ------------[ cut here ]------------ [ 74.411258] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:3376 alloc_unbound_pwq+0x4b0/0x690 [ 74.411262] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp [ 74.411303] CPU: 0 PID: 2378 Comm: mount Tainted: G W 4.12.0.debug+ #35 [ 74.411307] task: c0000003f0447280 task.stack: c0000003f039c000 [ 74.411312] NIP: c00000000011a4a0 LR: c00000000011a490 CTR: 0000000000000000 [ 74.411316] REGS: c0000003f039f550 TRAP: 0700 Tainted: G W (4.12.0.debug+) [ 74.411320] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> [ 74.411343] CR: 28028888 XER: 20000001 [ 74.411348] CFAR: c000000000581584 SOFTE: 1 [ 74.411348] GPR00: c00000000011a474 c0000003f039f7d0 c000000001751800 0000000000000001 [ 74.411348] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000 [ 74.411348] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000 [ 74.411348] GPR12: 0000000000008800 c00000000fac0000 0000000000000002 c0000003fd237000 [ 74.411348] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002 [ 74.411348] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294 [ 74.411348] GPR24: c0000003cb7ac400 c0000003f11ba800 c000000001935218 c0000003f0234a00 [ 74.411348] GPR28: 00000000000000a0 c0000003cb7ac400 c00000000179a294 00000000000000a0 [ 74.411431] NIP [c00000000011a4a0] alloc_unbound_pwq+0x4b0/0x690 [ 74.411436] LR [c00000000011a490] alloc_unbound_pwq+0x4a0/0x690 [ 74.411440] Call Trace: [ 74.411444] [c0000003f039f7d0] [c00000000011a474] alloc_unbound_pwq+0x484/0x690 (unreliable) [ 74.411452] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340 [ 74.411459] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0 [ 74.411465] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90 [ 74.411472] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0 [ 74.411488] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4] [ 74.411494] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260 [ 74.411510] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4] [ 74.411516] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210 [ 74.411523] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220 [ 74.411529] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70 [ 74.411535] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100 [ 74.411542] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0 [ 74.411547] Instruction dump: [ 74.411552] 4bffa3b9 93f9004c e93904b8 83fe0000 38a00000 7fe4fb78 e8690008 4846713d [ 74.411567] 60000000 7fe31a78 7c630074 7863d182 <0b030000> e93904b8 39400000 7f23cb78 [ 74.411584] ---[ end trace b720011b125c3342 ]--- [ 74.411704] ------------[ cut here ]------------ [ 74.411710] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:1788 create_worker+0x174/0x2c0 [ 74.411714] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp [ 74.411755] CPU: 0 PID: 2378 Comm: mount Tainted: G W 4.12.0.debug+ #35 [ 74.411759] task: c0000003f0447280 task.stack: c0000003f039c000 [ 74.411763] NIP: c000000000114ed4 LR: c000000000114ec4 CTR: c0000000001343e0 [ 74.411768] REGS: c0000003f039f4b0 TRAP: 0700 Tainted: G W (4.12.0.debug+) [ 74.411772] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> [ 74.411795] CR: 28028888 XER: 00000001 [ 74.411801] CFAR: c000000000581584 SOFTE: 1 [ 74.411801] GPR00: c000000000114ea0 c0000003f039f730 c000000001751800 0000000000000001 [ 74.411801] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 00000000000c0063 [ 74.411801] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000062 [ 74.411801] GPR12: 0000000048028882 c00000000fac0000 0000000000000002 c0000003fd237000 [ 74.411801] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002 [ 74.411801] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294 [ 74.411801] GPR24: c0000003cb7ac400 c0000003f11ba800 c000000001935218 c0000003f11baca8 [ 74.411801] GPR28: c0000003f039f790 00000000000000a0 c0000003fd25c000 c0000003f11ba800 [ 74.411884] NIP [c000000000114ed4] create_worker+0x174/0x2c0 [ 74.411888] LR [c000000000114ec4] create_worker+0x164/0x2c0 [ 74.411892] Call Trace: [ 74.411895] [c0000003f039f730] [c000000000114ea0] create_worker+0x140/0x2c0 (unreliable) [ 74.411903] [c0000003f039f7d0] [c00000000011a508] alloc_unbound_pwq+0x518/0x690 [ 74.411910] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340 [ 74.411916] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0 [ 74.411923] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90 [ 74.411929] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0 [ 74.411946] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4] [ 74.411952] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260 [ 74.411968] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4] [ 74.411974] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210 [ 74.411980] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220 [ 74.411986] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70 [ 74.411993] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100 [ 74.411999] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0 [ 74.412004] Instruction dump: [ 74.412009] 3d220005 39298a94 e87e0040 38a00000 83a90000 38630380 7fa4eb78 4846c709 [ 74.412025] 60000000 7fa31a78 7c630074 7863d182 <0b030000> 3d420005 394a8a94 e93f04b8 [ 74.412041] ---[ end trace b720011b125c3343 ]--- [ 74.412046] ------------[ cut here ]------------ [ 74.412051] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:1789 create_worker+0x1a8/0x2c0 [ 74.412055] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp [ 74.412095] CPU: 0 PID: 2378 Comm: mount Tainted: G W 4.12.0.debug+ #35 [ 74.412099] task: c0000003f0447280 task.stack: c0000003f039c000 [ 74.412103] NIP: c000000000114f08 LR: c000000000114ef8 CTR: c0000000001343e0 [ 74.412108] REGS: c0000003f039f4b0 TRAP: 0700 Tainted: G W (4.12.0.debug+) [ 74.412144] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> [ 74.412167] CR: 28028888 XER: 00000001 [ 74.412172] CFAR: c000000000581584 SOFTE: 1 [ 74.412172] GPR00: c000000000114ea0 c0000003f039f730 c000000001751800 0000000000000001 [ 74.412172] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 00000000000c0063 [ 74.412172] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000062 [ 74.412172] GPR12: 0000000048028882 c00000000fac0000 0000000000000002 c0000003fd237000 [ 74.412172] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002 [ 74.412172] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294 [ 74.412172] GPR24: c0000003cb7ac400 c0000003f11ba800 c000000001935218 c0000003f11baca8 [ 74.412172] GPR28: c0000003f039f790 00000000000000a0 c0000003fd25c000 c0000003f11ba800 [ 74.412255] NIP [c000000000114f08] create_worker+0x1a8/0x2c0 [ 74.412259] LR [c000000000114ef8] create_worker+0x198/0x2c0 [ 74.412263] Call Trace: [ 74.412267] [c0000003f039f730] [c000000000114ea0] create_worker+0x140/0x2c0 (unreliable) [ 74.412275] [c0000003f039f7d0] [c00000000011a508] alloc_unbound_pwq+0x518/0x690 [ 74.412281] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340 [ 74.412288] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0 [ 74.412294] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90 [ 74.412301] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0 [ 74.412317] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4] [ 74.412323] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260 [ 74.412339] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4] [ 74.412345] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210 [ 74.412352] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220 [ 74.412358] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70 [ 74.412364] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100 [ 74.412371] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0 [ 74.412376] Instruction dump: [ 74.412380] 3d420005 394a8a94 e93f04b8 38a00000 83aa0000 e8690008 7fa4eb78 4846c6d5 [ 74.412396] 60000000 7fa31a78 7c630074 7863d182 <0b030000> 7fe4fb78 7fc3f378 4bfffd75 [ 74.412412] ---[ end trace b720011b125c3344 ]--- [ 74.412524] select_task_rq: CPU 160 out of range for task c0000003f1500000 (kworker/u321:0) [ 74.412612] p->cpus_allowed: [ 74.412616] CPU: 0 PID: 2378 Comm: mount Tainted: G W 4.12.0.debug+ #35 [ 74.412620] Call Trace: [ 74.412625] [c0000003f039f620] [c000000000a562a8] dump_stack+0xe8/0x154 (unreliable) [ 74.412635] [c0000003f039f660] [c000000000135b2c] try_to_wake_up+0x1bc/0x940 [ 74.412641] [c0000003f039f730] [c000000000114f44] create_worker+0x1e4/0x2c0 [ 74.412647] [c0000003f039f7d0] [c00000000011a508] alloc_unbound_pwq+0x518/0x690 [ 74.412654] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340 [ 74.412660] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0 [ 74.412667] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90 [ 74.412673] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0 [ 74.412689] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4] [ 74.412700] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260 [ 74.412715] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4] [ 74.412722] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210 [ 74.412728] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220 [ 74.412734] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70 [ 74.412740] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100 [ 74.412749] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0 [ 74.420022] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null) Thanks, Eryu > > cheers > > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index c74bf39ef764..8ec3841f9689 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -3338,6 +3338,8 @@ static struct worker_pool *get_unbound_pool(const > struct workqueue_attrs *attrs) > > lockdep_assert_held(&wq_pool_mutex); > > + WARN_ON(cpumask_empty(attrs->cpumask)); > + > /* do we already have a matching pool? */ > hash_for_each_possible(unbound_pool_hash, pool, hash_node, hash) { > if (wqattrs_equal(pool->attrs, attrs)) { > @@ -3366,6 +3368,8 @@ static struct worker_pool *get_unbound_pool(const > struct workqueue_attrs *attrs) > copy_workqueue_attrs(pool->attrs, attrs); > pool->node = target_node; > > + WARN_ON(cpumask_empty(pool->attrs->cpumask)); > + > /* > * no_numa isn't a worker_pool attribute, always clear it. See > * 'struct workqueue_attrs' comments for detail. > @@ -5494,6 +5498,7 @@ static void __init wq_numa_init(void) > > for_each_possible_cpu(cpu) { > node = cpu_to_node(cpu); > + printk("%s: setting cpu %d on node %d present? %d\n", __func__, > cpu, node, cpu_present(cpu)); > if (WARN_ON(node == NUMA_NO_NODE)) { > pr_warn("workqueue: NUMA node mapping not available for > cpu%d, disabling NUMA support\n", cpu); > /* happens iff arch is bonkers, let's just proceed */ > @@ -5502,6 +5507,16 @@ static void __init wq_numa_init(void) > cpumask_set_cpu(cpu, tbl[node]); > } > > + for_each_possible_cpu(cpu) { > + struct worker_pool *pool; > + > + for_each_cpu_worker_pool(pool, cpu) { > + if (cpumask_empty(pool->attrs->cpumask)) > + printk("%s: cpumask EMPTY! for pool %p on cpu > %d\n", __func__, pool, cpu); > + printk("%s: pool %p on cpu %d node = %d\n", __func__, > pool, cpu, pool->node); > + } > + } > + > wq_numa_possible_cpumask = tbl; > wq_numa_enabled = true; > }
dmesg.log.bz2
Description: BZip2 compressed data