[Kernel-packages] [Bug 1791758] Re: ldisc crash on reopened tty

Guilherme G. Piccoli Tue, 08 Jan 2019 10:51:14 -0800

** Description changed:

  [Impact]
  
- The following Oops was discovered by user:
+ * Line discipline code is racy when we have buffer being flush while the
+ tty is being initialized or reinitialized. For the first problem, we
+ have an upstream patch since January 2018: b027e2298bd5 ("tty: fix data
+ race between tty_init_dev and flush of buf") - although it is not in
+ Ubuntu kernel 4.4, only in kernels 4.15 and subsequent ones.
  
- [684766.666639] BUG: unable to handle kernel paging request at 
0000000000002268
- [684766.667642] IP: [<ffffffff814e2a5a>] n_tty_receive_buf_common+0x6a/0xae0
- [684766.668487] PGD 80000019574fe067 PUD 19574ff067 PMD 0
- [684766.669194] Oops: 0000 [#1] SMP
- [684766.669687] Modules linked in: xt_nat dccp_diag dccp tcp_diag udp_diag 
inet_diag unix_diag xt_connmark ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink 
nfnetlink veth ip6table_filter ip6_tables xt_tcpmss xt_multiport xt_conntrack 
iptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle xt_CT iptable_raw 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment iptable_nat ip_tables x_tables 
target_core_mod configfs softdog scini(POE) ib_iser rdma_cm iw_cm ib_cm ib_sa 
ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
openvswitch(OE) nf_nat_ipv6 nf_nat_ipv4 nf_nat gre kvm_intel kvm irqbypass ttm 
crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel drm 
aesni_intel aes_x86_64 i2c_piix4 lrw gf128mul fb_sys_fops syscopyarea 
glue_helper sysfillrect ablk_helper cryptd sysimgblt joydev
- [684766.679406]  input_leds mac_hid serio_raw 8250_fintek br_netfilter bridge 
stp llc nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack xfs raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 psmouse multipath floppy pata_acpi linear dm_multipath
- [684766.683585] CPU: 15 PID: 7470 Comm: kworker/u40:1 Tainted: P           OE 
  4.4.0-124-generic #148~14.04.1-Ubuntu
- [684766.684967] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011
- [684766.686062] Workqueue: events_unbound flush_to_ldisc
- [684766.686703] task: ffff88165e5d8000 ti: ffff88170dc2c000 task.ti: 
ffff88170dc2c000
- [684766.687670] RIP: 0010:[<ffffffff814e2a5a>]  [<ffffffff814e2a5a>] 
n_tty_receive_buf_common+0x6a/0xae0
- [684766.688870] RSP: 0018:ffff88170dc2fd28  EFLAGS: 00010202
- [684766.689521] RAX: 0000000000000000 RBX: ffff88162c895000 RCX: 
0000000000000001
- [684766.690488] RDX: 0000000000000000 RSI: ffff88162c895020 RDI: 
ffff8819c2d3d4d8
- [684766.691518] RBP: ffff88170dc2fdc0 R08: 0000000000000001 R09: 
ffffffff81ec2ba0
- [684766.692480] R10: 0000000000000004 R11: 0000000000000000 R12: 
ffff8819c2d3d400
- [684766.693423] R13: ffff8819c45b2670 R14: ffff8816a358c028 R15: 
ffff8819c2d3d400
- [684766.694390] FS:  0000000000000000(0000) GS:ffff8819d73c0000(0000) 
knlGS:0000000000000000
- [684766.695484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [684766.696182] CR2: 0000000000002268 CR3: 0000001957520000 CR4: 
0000000000360670
- [684766.697141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
- [684766.698114] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
- [684766.699079] Stack:
- [684766.699412]  0000000000000000 ffff8819c2d3d4d8 0000000000000000 
ffff8819c2d3d648
- [684766.700467]  ffff8819c2d3d620 ffff8819c9c10400 ffff88170dc2fd68 
ffffffff8106312e
- [684766.701501]  ffff88170dc2fd78 0000000000000001 0000000000000000 
ffff88162c895020
- [684766.702534] Call Trace:
- [684766.702905]  [<ffffffff8106312e>] ? kvm_sched_clock_read+0x1e/0x30
- [684766.703685]  [<ffffffff814e34e4>] n_tty_receive_buf2+0x14/0x20
- [684766.704505]  [<ffffffff814e5f05>] flush_to_ldisc+0xd5/0x120
- [684766.705269]  [<ffffffff81099506>] process_one_work+0x156/0x400
- [684766.706008]  [<ffffffff81099eea>] worker_thread+0x11a/0x480
- [684766.706686]  [<ffffffff81099dd0>] ? rescuer_thread+0x310/0x310
- [684766.707386]  [<ffffffff8109f3b8>] kthread+0xd8/0xf0
- [684766.707993]  [<ffffffff8109f2e0>] ? kthread_park+0x60/0x60
- [684766.708664]  [<ffffffff8181a9b5>] ret_from_fork+0x55/0x80
- [684766.709335]  [<ffffffff8109f2e0>] ? kthread_park+0x60/0x60
- [684766.709998] Code: 85 70 ff ff ff e8 97 5f 33 00 49 8d 87 20 02 00 00 c7 
45 b4 00 00 00 00 48 89 45 88 49 8d 87 48 02 00 00 48 89 45 80 48 8b 45 b8 <48> 
8b b0 68 22 00 00 48 8b 08 89 f0 29 c8 41 f6 87 30 01 00 00
- [684766.713290] RIP  [<ffffffff814e2a5a>] n_tty_receive_buf_common+0x6a/0xae0
- [684766.714105]  RSP <ffff88170dc2fd28>
- [684766.714609] CR2: 0000000000002268
+ * For the race between the buffer flush while tty is being reopened, we
+ have a patch that addresses this issue recently merged for 5.0-rc1:
+ 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). No
+ Ubuntu kernel currently contains this patch, hence we're hereby
+ submitting the SRU request. The upstream complete patch series for this
+ is in [0].
  
- The issue happened in a VM
- KDUMP was configured, so a full Kernel crashdump was created
+ * The approach of both patches are similar - they rely in locking/semaphore 
to prevent race conditions. Some additional patches are
+ necessary to prevent correlated issues, like preventing a potential deadlock 
due to bad prioritization in servicing I/O over releasing
+ tty_ldisc_lock() - refer to c96cf923a98d ("tty: Don't block on IO when ldisc 
change is pending"). All the necessary fixes are grouped here in this SRU 
request.
  
- User has Ubuntu Trusty, Kernel 4.4.0-124 on its VM
+ * The symptom of the race condition between the buffer flush and the tty
+ reopen routine is a kernel crash with the following trace:
+ 
+ BUG: unable to handle kernel paging request at 0000000000002268
+ IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
+ [...]
+ Call Trace:
+ [<addr>] ? kvm_sched_clock_read+0x1e/0x30
+ [<addr>] n_tty_receive_buf2+0x14/0x20
+ [<addr>] flush_to_ldisc+0xd5/0x120
+ [<addr>] process_one_work+0x156/0x400
+ [<addr>] worker_thread+0x11a/0x480
+ [...]
+ 
+ * A kernel crash was collected from an user, analysis is present in
+ comment #4 in this LP.
+ 
  
  [Test Case]
  
- * Deploy a Trusty KVM instance with a LTS Xenial kernel (v4.4 series)
- * SSH in frequently while system is under load, send commands before the 
prompt has returned.
- ----
+ * It is not trivial to trigger this fault, but the usual recipe is to
+ keep accessing a machine through SSH (or IPMI serial console) and in
+ some way run commands before the terminal is ready in that machine (like
+ hacking some echo into ttySx or pts in an infinite loop).
  
- Check comment #5 for a summary about the upstream proposals to resolve
- this issue.
+ * We have reports of users that could reproduce this issue in their
+ production environment, and with the patches present in this SRU request
+ the problem was fixed.
+ 
+ 
+ [Regression Potential]
+ 
+ * tty subsystem is highly central and patches in that area are always
+ delicate. For example, the upstream series [0] is a re-spin (V6) due to
+ a hard to reproduce issue reported in the PA-RISC architecture, which
+ was found in the V5 iteration [1] but was fixed by the patch
+ c96cf923a98d, present in this SRU request.
+ 
+ * The patchset [0] is present in tty-next tree since mid-November, and
+ the patch b027e2298bd5 is available upstream since January/2018 (it's
+ available in both Ubuntu kernels 4.15 and 4.18), so the overall
+ likelihood of regressions is low.
+ 
+ * These patches were sniff-tested for the 3 versions (4.4, 4.15 and
+ 4.18) and didn't show any issues.
+ 
+ 
+ [0] https://marc.info/?l=linux-kernel&m=154103190111795
+ [1] https://marc.info/?l=linux-kernel&m=153737852618183


-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1791758

Title:
  ldisc crash on reopened tty

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed

Bug description:
  [Impact]

  * Line discipline code is racy when we have buffer being flush while
  the tty is being initialized or reinitialized. For the first problem,
  we have an upstream patch since January 2018: b027e2298bd5 ("tty: fix
  data race between tty_init_dev and flush of buf") - although it is not
  in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent ones.

  * For the race between the buffer flush while tty is being reopened,
  we have a patch that addresses this issue recently merged for 5.0-rc1:
  83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). No
  Ubuntu kernel currently contains this patch, hence we're hereby
  submitting the SRU request. The upstream complete patch series for
  this is in [0].

  * The approach of both patches are similar - they rely in locking/semaphore 
to prevent race conditions. Some additional patches are
  necessary to prevent correlated issues, like preventing a potential deadlock 
due to bad prioritization in servicing I/O over releasing
  tty_ldisc_lock() - refer to c96cf923a98d ("tty: Don't block on IO when ldisc 
change is pending"). All the necessary fixes are grouped here in this SRU 
request.

  * The symptom of the race condition between the buffer flush and the
  tty reopen routine is a kernel crash with the following trace:

  BUG: unable to handle kernel paging request at 0000000000002268
  IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
  [...]
  Call Trace:
  [<addr>] ? kvm_sched_clock_read+0x1e/0x30
  [<addr>] n_tty_receive_buf2+0x14/0x20
  [<addr>] flush_to_ldisc+0xd5/0x120
  [<addr>] process_one_work+0x156/0x400
  [<addr>] worker_thread+0x11a/0x480
  [...]

  * A kernel crash was collected from an user, analysis is present in
  comment #4 in this LP.

  
  [Test Case]

  * It is not trivial to trigger this fault, but the usual recipe is to
  keep accessing a machine through SSH (or IPMI serial console) and in
  some way run commands before the terminal is ready in that machine
  (like hacking some echo into ttySx or pts in an infinite loop).

  * We have reports of users that could reproduce this issue in their
  production environment, and with the patches present in this SRU
  request the problem was fixed.

  
  [Regression Potential]

  * tty subsystem is highly central and patches in that area are always
  delicate. For example, the upstream series [0] is a re-spin (V6) due
  to a hard to reproduce issue reported in the PA-RISC architecture,
  which was found in the V5 iteration [1] but was fixed by the patch
  c96cf923a98d, present in this SRU request.

  * The patchset [0] is present in tty-next tree since mid-November, and
  the patch b027e2298bd5 is available upstream since January/2018 (it's
  available in both Ubuntu kernels 4.15 and 4.18), so the overall
  likelihood of regressions is low.

  * These patches were sniff-tested for the 3 versions (4.4, 4.15 and
  4.18) and didn't show any issues.

  
  [0] https://marc.info/?l=linux-kernel&m=154103190111795
  [1] https://marc.info/?l=linux-kernel&m=153737852618183

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791758/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1791758] Re: ldisc crash on reopened tty

Reply via email to