Resending as plaintext.

From: Joseph Feather
Sent: Thursday, July 30, 2020 9:59 AM
To: Paul E. McKenney <paul...@linux.vnet.ibm.com>; Josh Triplett 
<j...@joshtriplett.org>
Cc: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; Timothy 
Heilman <timothy.heil...@gm.com>; Wilfredo Crespo <wilfredo.cre...@gm.com>; 
Michael Barr <michael.1.b...@gm.com>
Subject: kswapd warning at kernel/rcu/tree.c:1358 rcu_advance_cbs_nowake 
causing processes to enter d state

All,
After upgrading our systems from 5.3.x to 5.4.51 we started experiencing 
problems where some processes would enter an uninterruptible sleep state.

We do not have a swap enabled or a swap mount point.

We have tried updating to 5.7.10 and still experience the same issue. 
Downgrading to 5.3.x fixes the problem.

Please let me know if you need any other data.  I've also created a ticket on 
bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208685
Looking through dmesg we found the following error message:
[819876.687024] ------------[ cut here ]------------
[819876.687038] WARNING: CPU: 22 PID: 434 at kernel/rcu/tree.c:1358 
rcu_advance_cbs_nowake+0x53/0x60
[819876.687038] Modules linked in: binfmt_misc(E) intel_rapl_msr(E) 
dell_smbios(E) wmi_bmof(E) dell_wmi_descriptor(E) sr_mod(E) cdrom(E) dcdbas(E) 
xt_comment(E) xt_multiport(E) ip6t_rpfilter(E) ip6t_RE
JECT(E) nf_reject_ipv6(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_conntrack(E) 
ebtable_nat(E) ebtable_broute(E) ip6table_nat(E) ip6table_mangle(E) 
ip6table_security(E) ip6table_raw(E) iptable_nat(E) nf_nat(
E) iptable_mangle(E) iptable_security(E) iptable_raw(E) nf_conntrack(E) 
nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_filter(E) 
ebtables(E) ip6table_filter(E) ip6_tables(E) iptable
_filter(E) intel_rapl_common(E) vfat(E) fat(E) skx_edac(E) nfit(E) libnvdimm(E) 
x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm(E) irqbypass(E) 
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clm
ulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) glue_helper(E) 
wdat_wdt(E) pcspkr(E) sg(E) i2c_i801(E) lpc_ich(E) mei_me(E) mei(E) ioatdma(E) 
wmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E)
[819876.687069]  acpi_power_meter(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) 
crc32c_intel(E) megaraid_sas(E) mgag200(E) i40e(E) drm_kms_helper(E) 
syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fo
ps(E) igb(E) ptp(E) pps_core(E) drm_vram_helper(E) dca(E) ttm(E) ahci(E) 
libahci(E) drm(E) i2c_algo_bit(E) libata(E) uas(E) usb_storage(E) dm_mirror(E) 
dm_region_hash(E) dm_log(E) dm_mod(E)
[819876.687087] CPU: 22 PID: 434 Comm: kswapd0 Tainted: G            E     
5.4.51-1.el7.x86_64 #1
[819876.687087] Hardware name: Dell Inc. PowerEdge R740xd/0YNX56, BIOS 2.2.11 
06/13/2019
[819876.687090] RIP: 0010:rcu_advance_cbs_nowake+0x53/0x60
[819876.687092] Code: 85 c0 74 f1 48 89 ee 48 89 df e8 68 ff ff ff 84 c0 75 17 
48 89 df c6 07 00 0f 1f 40 00 5b 5d c3 48 89 f7 e8 1f df ff ff eb c3 <0f> 0b eb 
e5 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 55 53 48 89
[819876.687093] RSP: 0018:ffffaaf58d713a40 EFLAGS: 00010002
[819876.687094] RAX: 0000000000000001 RBX: ffffffff9a2655c0 RCX: 
0000000008326518
[819876.687095] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 
ffffffff9a265140
[819876.687096] RBP: ffff9547a0eeb2c0 R08: ffff95317eefb920 R09: 
0000000130dae188
[819876.687096] R10: 0000000000000000 R11: 0000000000000001 R12: 
ffff953c2a0c1da8
[819876.687097] R13: 000000000002b2c0 R14: ffffaaf58d713b48 R15: 
ffff95317eefb6d8
[819876.687099] FS:  0000000000000000(0000) GS:ffff9547a0ec0000(0000) 
knlGS:0000000000000000
[819876.687100] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[819876.687101] CR2: 00007fc07fe2e300 CR3: 00000005ffe0a001 CR4: 
00000000007606e0
[819876.687101] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[819876.687102] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[819876.687103] PKRU: 55555554
[819876.687103] Call Trace:
[819876.687114]  ? radix_tree_node_ctor+0x90/0x90
[819876.687115]  __call_rcu+0x2aa/0x4e0
[819876.687120]  xas_free_nodes+0x8b/0xb0
[819876.687122]  xas_store+0x1a4/0x550
[819876.687130]  shadow_lru_isolate+0xed/0x170
[819876.687132]  ? scan_shadow_nodes+0x30/0x30
[819876.687134]  ? workingset_update_node+0x70/0x70
[819876.687136]  __list_lru_walk_one+0x6d/0x170
[819876.687138]  ? workingset_update_node+0x70/0x70
[819876.687140]  list_lru_walk_one_irq+0x4c/0x70
[819876.687145]  do_shrink_slab+0x150/0x310
[819876.687148]  shrink_slab+0x239/0x2d0
[819876.687151]  shrink_node+0xd5/0x450
[819876.687154]  balance_pgdat+0x275/0x560
[819876.687157]  kswapd+0x15a/0x3d0
[819876.687160]  ? remove_wait_queue+0x60/0x60
[819876.687166]  kthread+0xf8/0x130
[819876.687169]  ? balance_pgdat+0x560/0x560
[819876.687170]  ? kthread_bind+0x10/0x10
[819876.687173]  ret_from_fork+0x35/0x40
[819876.687175] ---[ end trace ff791da2812c6a3e ]---

After that message we see our process that is in D state:
[820150.702270] INFO: task python:11885 blocked for more than 122 seconds.
[820150.702315]       Tainted: G        W   E     5.4.51-1.el7.x86_64 #1
[820150.702348] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[820150.702387] python          D    0 11885  11872 0x00000080
[820150.702390] Call Trace:
[820150.702401]  __schedule+0x2d1/0x6c0
[820150.702404]  schedule+0x39/0xa0
[820150.702406]  schedule_timeout+0x1c8/0x290
[820150.702413]  ? rcu_accelerate_cbs+0x61/0x180
[820150.702415]  wait_for_completion+0x123/0x190
[820150.702420]  ? wake_up_q+0x70/0x70
[820150.702422]  __wait_rcu_gp+0xfd/0x130
[820150.702425]  synchronize_rcu+0x4e/0x80
[820150.702427]  ? __call_rcu+0x4e0/0x4e0
[820150.702428]  ? __bpf_trace_rcu_utilization+0x10/0x10
[820150.702432]  packet_release+0x247/0x380
[820150.702437]  __sock_release+0x3d/0xc0
[820150.702439]  sock_close+0x11/0x20
[820150.702442]  __fput+0xbe/0x250
[820150.702446]  task_work_run+0x88/0xa0
[820150.702452]  exit_to_usermode_loop+0x77/0xc7
[820150.702454]  do_syscall_64+0x182/0x1b0
[820150.702457]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[820150.702459] RIP: 0033:0x7fd85ca297a0
[820150.702464] Code: Bad RIP value.
[820150.702465] RSP: 002b:00007ffd05cc66b8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000003
[820150.702466] RAX: 0000000000000000 RBX: 00007fd85d0cba50 RCX: 
00007fd85ca297a0
[820150.702467] RDX: 0000000000000000 RSI: 00007fd85d06a210 RDI: 
0000000000000003
[820150.702468] RBP: 0000000000000007 R08: 0000000000000000 R09: 
00007fd85d0cc1e7
[820150.702469] R10: 0000000000000083 R11: 0000000000000246 R12: 
0000000000000007
[820150.702470] R13: 00007fd85d06a210 R14: 00007fd85ccd4590 R15: 
00000000022750a0

-Joe


Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.

Reply via email to