** Description changed: [Impact] - The following Oops was discovered by user: + * Line discipline code is racy when we have buffer being flush while the + tty is being initialized or reinitialized. For the first problem, we + have an upstream patch since January 2018: b027e2298bd5 ("tty: fix data + race between tty_init_dev and flush of buf") - although it is not in + Ubuntu kernel 4.4, only in kernels 4.15 and subsequent ones. - [684766.666639] BUG: unable to handle kernel paging request at 0000000000002268 - [684766.667642] IP: [<ffffffff814e2a5a>] n_tty_receive_buf_common+0x6a/0xae0 - [684766.668487] PGD 80000019574fe067 PUD 19574ff067 PMD 0 - [684766.669194] Oops: 0000 [#1] SMP - [684766.669687] Modules linked in: xt_nat dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag xt_connmark ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink veth ip6table_filter ip6_tables xt_tcpmss xt_multiport xt_conntrack iptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle xt_CT iptable_raw ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment iptable_nat ip_tables x_tables target_core_mod configfs softdog scini(POE) ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi openvswitch(OE) nf_nat_ipv6 nf_nat_ipv4 nf_nat gre kvm_intel kvm irqbypass ttm crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel drm aesni_intel aes_x86_64 i2c_piix4 lrw gf128mul fb_sys_fops syscopyarea glue_helper sysfillrect ablk_helper cryptd sysimgblt joydev - [684766.679406] input_leds mac_hid serio_raw 8250_fintek br_netfilter bridge stp llc nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 psmouse multipath floppy pata_acpi linear dm_multipath - [684766.683585] CPU: 15 PID: 7470 Comm: kworker/u40:1 Tainted: P OE 4.4.0-124-generic #148~14.04.1-Ubuntu - [684766.684967] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 - [684766.686062] Workqueue: events_unbound flush_to_ldisc - [684766.686703] task: ffff88165e5d8000 ti: ffff88170dc2c000 task.ti: ffff88170dc2c000 - [684766.687670] RIP: 0010:[<ffffffff814e2a5a>] [<ffffffff814e2a5a>] n_tty_receive_buf_common+0x6a/0xae0 - [684766.688870] RSP: 0018:ffff88170dc2fd28 EFLAGS: 00010202 - [684766.689521] RAX: 0000000000000000 RBX: ffff88162c895000 RCX: 0000000000000001 - [684766.690488] RDX: 0000000000000000 RSI: ffff88162c895020 RDI: ffff8819c2d3d4d8 - [684766.691518] RBP: ffff88170dc2fdc0 R08: 0000000000000001 R09: ffffffff81ec2ba0 - [684766.692480] R10: 0000000000000004 R11: 0000000000000000 R12: ffff8819c2d3d400 - [684766.693423] R13: ffff8819c45b2670 R14: ffff8816a358c028 R15: ffff8819c2d3d400 - [684766.694390] FS: 0000000000000000(0000) GS:ffff8819d73c0000(0000) knlGS:0000000000000000 - [684766.695484] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 - [684766.696182] CR2: 0000000000002268 CR3: 0000001957520000 CR4: 0000000000360670 - [684766.697141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 - [684766.698114] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 - [684766.699079] Stack: - [684766.699412] 0000000000000000 ffff8819c2d3d4d8 0000000000000000 ffff8819c2d3d648 - [684766.700467] ffff8819c2d3d620 ffff8819c9c10400 ffff88170dc2fd68 ffffffff8106312e - [684766.701501] ffff88170dc2fd78 0000000000000001 0000000000000000 ffff88162c895020 - [684766.702534] Call Trace: - [684766.702905] [<ffffffff8106312e>] ? kvm_sched_clock_read+0x1e/0x30 - [684766.703685] [<ffffffff814e34e4>] n_tty_receive_buf2+0x14/0x20 - [684766.704505] [<ffffffff814e5f05>] flush_to_ldisc+0xd5/0x120 - [684766.705269] [<ffffffff81099506>] process_one_work+0x156/0x400 - [684766.706008] [<ffffffff81099eea>] worker_thread+0x11a/0x480 - [684766.706686] [<ffffffff81099dd0>] ? rescuer_thread+0x310/0x310 - [684766.707386] [<ffffffff8109f3b8>] kthread+0xd8/0xf0 - [684766.707993] [<ffffffff8109f2e0>] ? kthread_park+0x60/0x60 - [684766.708664] [<ffffffff8181a9b5>] ret_from_fork+0x55/0x80 - [684766.709335] [<ffffffff8109f2e0>] ? kthread_park+0x60/0x60 - [684766.709998] Code: 85 70 ff ff ff e8 97 5f 33 00 49 8d 87 20 02 00 00 c7 45 b4 00 00 00 00 48 89 45 88 49 8d 87 48 02 00 00 48 89 45 80 48 8b 45 b8 <48> 8b b0 68 22 00 00 48 8b 08 89 f0 29 c8 41 f6 87 30 01 00 00 - [684766.713290] RIP [<ffffffff814e2a5a>] n_tty_receive_buf_common+0x6a/0xae0 - [684766.714105] RSP <ffff88170dc2fd28> - [684766.714609] CR2: 0000000000002268 + * For the race between the buffer flush while tty is being reopened, we + have a patch that addresses this issue recently merged for 5.0-rc1: + 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). No + Ubuntu kernel currently contains this patch, hence we're hereby + submitting the SRU request. The upstream complete patch series for this + is in [0]. - The issue happened in a VM - KDUMP was configured, so a full Kernel crashdump was created + * The approach of both patches are similar - they rely in locking/semaphore to prevent race conditions. Some additional patches are + necessary to prevent correlated issues, like preventing a potential deadlock due to bad prioritization in servicing I/O over releasing + tty_ldisc_lock() - refer to c96cf923a98d ("tty: Don't block on IO when ldisc change is pending"). All the necessary fixes are grouped here in this SRU request. - User has Ubuntu Trusty, Kernel 4.4.0-124 on its VM + * The symptom of the race condition between the buffer flush and the tty + reopen routine is a kernel crash with the following trace: + + BUG: unable to handle kernel paging request at 0000000000002268 + IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0 + [...] + Call Trace: + [<addr>] ? kvm_sched_clock_read+0x1e/0x30 + [<addr>] n_tty_receive_buf2+0x14/0x20 + [<addr>] flush_to_ldisc+0xd5/0x120 + [<addr>] process_one_work+0x156/0x400 + [<addr>] worker_thread+0x11a/0x480 + [...] + + * A kernel crash was collected from an user, analysis is present in + comment #4 in this LP. + [Test Case] - * Deploy a Trusty KVM instance with a LTS Xenial kernel (v4.4 series) - * SSH in frequently while system is under load, send commands before the prompt has returned. - ---- + * It is not trivial to trigger this fault, but the usual recipe is to + keep accessing a machine through SSH (or IPMI serial console) and in + some way run commands before the terminal is ready in that machine (like + hacking some echo into ttySx or pts in an infinite loop). - Check comment #5 for a summary about the upstream proposals to resolve - this issue. + * We have reports of users that could reproduce this issue in their + production environment, and with the patches present in this SRU request + the problem was fixed. + + + [Regression Potential] + + * tty subsystem is highly central and patches in that area are always + delicate. For example, the upstream series [0] is a re-spin (V6) due to + a hard to reproduce issue reported in the PA-RISC architecture, which + was found in the V5 iteration [1] but was fixed by the patch + c96cf923a98d, present in this SRU request. + + * The patchset [0] is present in tty-next tree since mid-November, and + the patch b027e2298bd5 is available upstream since January/2018 (it's + available in both Ubuntu kernels 4.15 and 4.18), so the overall + likelihood of regressions is low. + + * These patches were sniff-tested for the 3 versions (4.4, 4.15 and + 4.18) and didn't show any issues. + + + [0] https://marc.info/?l=linux-kernel&m=154103190111795 + [1] https://marc.info/?l=linux-kernel&m=153737852618183
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1791758 Title: ldisc crash on reopened tty Status in linux package in Ubuntu: In Progress Status in linux source package in Trusty: Won't Fix Status in linux source package in Xenial: Confirmed Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Bug description: [Impact] * Line discipline code is racy when we have buffer being flush while the tty is being initialized or reinitialized. For the first problem, we have an upstream patch since January 2018: b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf") - although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent ones. * For the race between the buffer flush while tty is being reopened, we have a patch that addresses this issue recently merged for 5.0-rc1: 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). No Ubuntu kernel currently contains this patch, hence we're hereby submitting the SRU request. The upstream complete patch series for this is in [0]. * The approach of both patches are similar - they rely in locking/semaphore to prevent race conditions. Some additional patches are necessary to prevent correlated issues, like preventing a potential deadlock due to bad prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to c96cf923a98d ("tty: Don't block on IO when ldisc change is pending"). All the necessary fixes are grouped here in this SRU request. * The symptom of the race condition between the buffer flush and the tty reopen routine is a kernel crash with the following trace: BUG: unable to handle kernel paging request at 0000000000002268 IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0 [...] Call Trace: [<addr>] ? kvm_sched_clock_read+0x1e/0x30 [<addr>] n_tty_receive_buf2+0x14/0x20 [<addr>] flush_to_ldisc+0xd5/0x120 [<addr>] process_one_work+0x156/0x400 [<addr>] worker_thread+0x11a/0x480 [...] * A kernel crash was collected from an user, analysis is present in comment #4 in this LP. [Test Case] * It is not trivial to trigger this fault, but the usual recipe is to keep accessing a machine through SSH (or IPMI serial console) and in some way run commands before the terminal is ready in that machine (like hacking some echo into ttySx or pts in an infinite loop). * We have reports of users that could reproduce this issue in their production environment, and with the patches present in this SRU request the problem was fixed. [Regression Potential] * tty subsystem is highly central and patches in that area are always delicate. For example, the upstream series [0] is a re-spin (V6) due to a hard to reproduce issue reported in the PA-RISC architecture, which was found in the V5 iteration [1] but was fixed by the patch c96cf923a98d, present in this SRU request. * The patchset [0] is present in tty-next tree since mid-November, and the patch b027e2298bd5 is available upstream since January/2018 (it's available in both Ubuntu kernels 4.15 and 4.18), so the overall likelihood of regressions is low. * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18) and didn't show any issues. [0] https://marc.info/?l=linux-kernel&m=154103190111795 [1] https://marc.info/?l=linux-kernel&m=153737852618183 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791758/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp