** Also affects: linux (Ubuntu Oracular)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Oracular)
Status: New => Fix Committed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2101120
Title:
mptcp BUG 'scheduling while atomic' in
mptcp_pm_nl_append_new_local_addr
Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Oracular:
Fix Committed
Bug description:
[Impact]
If mptcp endpoints are configured on a host using an address that is
external to the host, then the kernel will create an implicit endpoint
with the host's local address when mptcp receives its first flow. If
multiple packets for these local interfaces arrive in parallel, more
than one caller may end up in mptcp_pm_nl_append_new_local_addr
because none found the address in local_addr_list during their call to
mptcp_pm_nl_get_local_id. In this case, the concurrent new_local_addr
calls may delete the address entry created by the previous caller.
These deletes use synchronize_rcu, but this is not permitted in some
of the contexts where this function may be called. During packet
recv, the caller may be in a rcu read critical section and have
preemption disabled.
This can lead to a BUG / panic because synchronize_rcu is called in
softint context.
An example stack:
BUG: scheduling while atomic: swapper/2/0/0x00000302
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
dump_stack (lib/dump_stack.c:124)
__schedule_bug (kernel/sched/core.c:5943)
schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33
kernel/sched/core.c:5970)
__schedule (arch/x86/include/asm/jump_label.h:27
include/linux/jump_label.h:207 kernel/sched/features.h:29
kernel/sched/core.c:6621)
schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804
kernel/sched/core.c:6818)
schedule_timeout (kernel/time/timer.c:2160)
wait_for_completion (kernel/sched/completion.c:96
kernel/sched/completion.c:116 kernel/sched/completion.c:127
kernel/sched/completion.c:148)
__wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
synchronize_rcu (kernel/rcu/tree.c:3609)
mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966
net/mptcp/pm_netlink.c:1061)
mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
mptcp_pm_get_local_id (net/mptcp/pm.c:420)
subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
subflow_v4_route_req (net/mptcp/subflow.c:305)
tcp_conn_request (net/ipv4/tcp_input.c:7216)
subflow_v4_conn_request (net/mptcp/subflow.c:651)
tcp_rcv_state_process (net/ipv4/tcp_input.c:6709)
tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934)
tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334)
ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
ip_local_deliver (include/linux/netfilter.h:314
include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580)
ip_sublist_rcv (net/ipv4/ip_input.c:640)
ip_list_rcv (net/ipv4/ip_input.c:675)
__netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631)
netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774)
napi_complete_done (include/linux/list.h:37 include/net/gro.h:449
include/net/gro.h:444 net/core/dev.c:6114)
igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb
__napi_poll (net/core/dev.c:6582)
net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787)
handle_softirqs (kernel/softirq.c:553)
__irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427
kernel/softirq.c:636)
irq_exit_rcu (kernel/softirq.c:651)
common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
</IRQ>
[Backport]
Cherry-pick the following patch from upstream:
022bfe24aad8 ("mptcp: fix 'scheduling while atomic' in
mptcp_pm_nl_append_new_local_addr")
This patch fixes the problem by deleting the duplicate prior to its
insertion in local_addr_list by skipping the replacement operation in
mptcp_pm_nl_append_new_local_addr. Instead of the last implicit
endpoint replacing the previous, it is discarded without a
synchronize_rcu and the old copy is kept. This mode is only selected
in mptcp_pm_nl_get_local_id.
[Test]
This patch has passed the upstream mptcp test suites and has also been
tested against the reproducer that triggered the panic. (Add and
remove mptcp endpoints with an external address that differs from the
internal address). Prior to this patch the problem would trigger in
less than a minute. With this patch applied, the test has run for
hours without incident.
[Potential Regression]
The regression potential is low since the behavior change is small.
Implicit endpoints still get created and deleted, but they are only
replaced when a user adds an endpoint with the same local address as
an existing implicit address. No replacements via
mptcp_pm_nl_get_local_id will occur anymore.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2101120/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp