2018-09-10 3:24 GMT+09:00 Eric Dumazet <eduma...@google.com>: > On Sun, Sep 9, 2018 at 10:47 AM Taehee Yoo <ap420...@gmail.com> wrote: >> >> A kernel crash occurrs when defragmented packet is fragmented >> in ip_do_fragment(). >> In defragment routine, skb_orphan() is called and >> skb->ip_defrag_offset is set. but skb->sk and >> skb->ip_defrag_offset are same union member. so that >> frag->sk is not NULL. >> Hence crash occurrs in skb->sk check routine in ip_do_fragment() when >> defragmented packet is fragmented. >> >> test commands: >> %iptables -t nat -I POSTROUTING -j MASQUERADE >> %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000 >> >> splat looks like: >> >> v2: >> - clear skb->sk at reassembly routine.(Eric Dumarzet) >> >> Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.") >> Suggested-by: Eric Dumazet <eduma...@google.com> >> Signed-off-by: Taehee Yoo <ap420...@gmail.com> >> --- >> net/ipv4/ip_fragment.c | 1 + >> net/ipv6/netfilter/nf_conntrack_reasm.c | 1 + >> 2 files changed, 2 insertions(+) >> >> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c >> index 88281fbce88c..e7227128df2c 100644 >> --- a/net/ipv4/ip_fragment.c >> +++ b/net/ipv4/ip_fragment.c >> @@ -599,6 +599,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff >> *skb, >> nextp = &fp->next; >> fp->prev = NULL; >> memset(&fp->rbnode, 0, sizeof(fp->rbnode)); >> + fp->sk = NULL; >> head->data_len += fp->len; >> head->len += fp->len; >> if (head->ip_summed != fp->ip_summed) >> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c >> b/net/ipv6/netfilter/nf_conntrack_reasm.c >> index 2a14d8b65924..8f68a518d9db 100644 >> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c >> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c >> @@ -445,6 +445,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff >> *prev, struct net_devic >> else if (head->ip_summed == CHECKSUM_COMPLETE) >> head->csum = csum_add(head->csum, fp->csum); >> head->truesize += fp->truesize; >> + fp->sk = NULL; > > This is not needed. IPv6 paths were not changed by recent commits. > >> } >> sub_frag_mem_limit(fq->q.net, head->truesize); >> >> -- >> 2.17.1 >>
Hi Eric, Thank you for review! I think netfilter side ipv6 code change is needed because netfilter ipv6 defrag routine also set fp->ip_defrag_offset value so that fp->sk will not be NULL. And I think these crash in ip_do_fragment() and ip6_fragment() are actually same bug. My ipv6 test environment is below PC1<----------------->FW<----------------->PC2 cd::02/64 cd::01/64 ab::01/64 ab::02/64 FW command: %ip6tables -t nat -I POSTROUTING -j MASQUERADE PC2 command: %ping6 cd::02 -s 60000 FW crash message: [ 502.676552] kernel BUG at net/ipv6/ip6_output.c:658! [ 502.682641] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 502.683545] CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.19.0-rc2+ #12 [ 502.692231] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [ 502.692231] RIP: 0010:ip6_fragment+0x16fa/0x3580 [ 502.692231] Code: 45 00 f8 0f 85 c3 03 00 00 49 8d 7f 18 48 89 f8 48 c1 e8 03 42 80 3c 28 00 0f 85 23 18 00 00 49 83 7f 18 00 0f 84 22 fe ff ff <0f> 0b 48 85 c0 0f 85 43 02 00 00 49 83 e6 fe 48 b8 00 00 00 00 00 [ 502.692231] RSP: 0018:ffff88011561eb58 EFLAGS: 00010202 [ 502.692231] RAX: 1ffff1002142e37b RBX: ffff8801062c16c0 RCX: 0000000000000004 [ 502.692231] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88010a171bd8 [ 502.692231] RBP: ffffed0022ac3d89 R08: ffffed002142e395 R09: ffffed002142e395 [ 502.692231] R10: 0000000000000001 R11: ffffed002142e394 R12: 0000000000000028 [ 502.692231] R13: dffffc0000000000 R14: ffff88010a171ca4 R15: ffff88010a171bc0 [ 502.692231] FS: 0000000000000000(0000) GS:ffff880116600000(0000) knlGS:0000000000000000 [ 502.692231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 502.692231] CR2: 0000000002452c64 CR3: 0000000105fea000 CR4: 00000000001006e0 [ 502.692231] Call Trace: [ 502.692231] ? ip6_forward_finish+0x370/0x370 [ 502.692231] ? ip6_forward+0x34e0/0x34e0 [ 502.813859] ? ip6_mtu+0x24d/0x310 [ 502.813859] ? ip6_negative_advice+0x160/0x160 [ 502.813859] ? rcu_is_watching+0x77/0x120 [ 502.813859] ? nf_nat_ipv6_out+0x206/0x4d0 [nf_nat_ipv6] [ 502.813859] ? ip6_finish_output+0x35e/0x920 [ 502.813859] ip6_output+0x1e5/0x870 [ 502.813859] ? ip6_finish_output+0x920/0x920 [ 502.813859] ? __lock_acquire+0x4500/0x4500 [ 502.813859] ? __nf_nat_alloc_null_binding+0x20c/0x340 [nf_nat] [ 502.813859] ? ip6_fragment+0x3580/0x3580 [ 502.813859] ip6_forward+0x1328/0x34e0 [ 502.813859] ? nf_nat_inet_fn+0x446/0x970 [nf_nat] [ 502.813859] ? ip6_autoflowlabel+0xb0/0xb0 [ 502.813859] ? ip6_route_input+0x639/0xc80 [ 502.813859] ? save_trace+0x300/0x300 [ 502.813859] ? ip6_route_info_create+0x35e0/0x35e0 [ 502.813859] ? ip6_rcv_finish_core.isra.10+0x152/0x520 [ 502.813859] ? nf_hook.constprop.19+0x780/0x780 [ 502.813859] ? nf_nat_ipv6_out+0x4d0/0x4d0 [nf_nat_ipv6] [ 502.813859] ? ipv6_rcv+0x40b/0x500 [ 502.813859] ? ip6_autoflowlabel+0xb0/0xb0 [ 502.813859] ipv6_rcv+0x40b/0x500 [ 502.813859] ? ip6_rcv_core.isra.14+0x1dd0/0x1dd0 [ 502.813859] ? lock_acquire+0x196/0x470 [ 502.813859] ? ip6_rcv_finish_core.isra.10+0x520/0x520 [ 502.813859] ? ip6_rcv_core.isra.14+0x1dd0/0x1dd0 [ 502.813859] __netif_receive_skb_one_core+0x115/0x1a0 [ 502.813859] ? rcu_is_watching+0x77/0x120 [ 502.813859] ? __netif_receive_skb_core+0x2ac0/0x2ac0 [ 502.813859] ? _rcu_barrier_trace+0x400/0x400 [ 502.813859] ? rcu_pm_notify+0xc0/0xc0 [ 502.813859] netif_receive_skb_internal+0xd8/0x570 [ 502.813859] ? dev_cpu_dead+0x980/0x980 [ 502.813859] ? rcu_read_lock_sched_held+0x114/0x130 [ 502.813859] ? kmem_cache_alloc+0x1ea/0x280 [ 502.813859] ? memset+0x1f/0x40 [ 502.813859] ? rcu_pm_notify+0xc0/0xc0 [ 502.813859] napi_gro_receive+0x344/0x410 [ 502.813859] ? dev_gro_receive+0x29b0/0x29b0 [ 502.813859] ? build_skb+0x63/0x2b0 [ 502.813859] ? eth_gro_receive+0x8f0/0x8f0 [ 502.813859] ? __build_skb+0x3b0/0x3b0 [ 502.813859] igb_poll+0x16e1/0x4650 [ 502.813859] ? igb_alloc_rx_buffers+0xa80/0xa80 [ 502.813859] ? sched_clock_cpu+0x126/0x170 [ 502.813859] ? stop_critical_timings+0x420/0x420 [ 502.813859] ? net_rx_action+0x3f0/0x1440 [ 502.813859] ? net_rx_action+0x3f0/0x1440 [ 502.813859] ? rcu_pm_notify+0xc0/0xc0 [ 502.813859] net_rx_action+0x5db/0x1440 [ 502.813859] ? _raw_spin_unlock+0x24/0x30 [ 502.813859] ? napi_complete_done+0x4c0/0x4c0 [ 502.813859] ? pfifo_fast_peek+0x1c0/0x1c0 [ 502.813859] ? cyc2ns_read_end+0x10/0x10 [ 502.813859] ? save_trace+0x300/0x300 [ 502.813859] ? save_trace+0x300/0x300 [ 502.813859] ? sched_clock_cpu+0x126/0x170 [ 502.813859] ? find_held_lock+0x39/0x1c0 [ 502.813859] ? lock_acquire+0x196/0x470 [ 502.813859] ? check_flags.part.36+0x450/0x450 [ 502.813859] ? check_flags.part.36+0x450/0x450 [ 502.813859] ? __run_timers+0x6ff/0x990 [ 502.813859] ? do_raw_spin_unlock+0xa5/0x330 [ 502.813859] ? do_raw_spin_trylock+0x1a0/0x1a0 [ 502.813859] ? do_raw_spin_trylock+0x101/0x1a0 [ 502.813859] ? _raw_spin_unlock+0x24/0x30 [ 502.813859] ? net_tx_action+0x6cb/0xad0 [ 502.813859] ? dev_queue_xmit_nit+0xe70/0xe70 [ 502.813859] ? _raw_spin_unlock_irq+0x31/0x40 [ 502.813859] ? stop_critical_timings+0x420/0x420 [ 502.813859] ? finish_task_switch+0x183/0x740 [ 502.813859] ? __switch_to_asm+0x30/0x60 [ 502.813859] ? __switch_to_asm+0x24/0x60 [ 502.813859] ? __do_softirq+0x263/0xa11 [ 502.813859] ? __do_softirq+0x263/0xa11 [ 502.813859] ? rcu_pm_notify+0xc0/0xc0 [ 502.813859] __do_softirq+0x2a5/0xa11 [ 502.813859] ? __sched_text_start+0x8/0x8 [ 502.813859] ? __irqentry_text_end+0x1fa51c/0x1fa51c [ 502.813859] ? smpboot_thread_fn+0xd5/0x700 [ 502.813859] ? tracer_hardirqs_on+0x420/0x420 [ 502.813859] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 502.813859] ? run_ksoftirqd+0xb/0x50 [ 502.813859] ? trace_hardirqs_off+0x6b/0x210 [ 502.813859] ? trace_hardirqs_on_caller+0x210/0x210 [ 502.813859] ? takeover_tasklets+0x860/0x860 [ 502.813859] ? takeover_tasklets+0x860/0x860 [ 502.813859] run_ksoftirqd+0x24/0x50 [ 502.813859] smpboot_thread_fn+0x3cd/0x700 [ 502.813859] ? sort_range+0x20/0x20 [ 502.813859] ? __kthread_parkme+0xb6/0x180 [ 502.813859] ? sort_range+0x20/0x20 [ 502.813859] kthread+0x322/0x3e0 [ 502.813859] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 502.813859] ret_from_fork+0x3a/0x50 [ 502.813859] Modules linked in: ip6t_MASQUERADE ip6table_nat nf_nat_ipv6 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables ip_tables x_tables [ 503.279587] ---[ end trace 49127255558e40c3 ]--- [ 503.284881] RIP: 0010:ip6_fragment+0x16fa/0x3580 [ 503.290150] Code: 45 00 f8 0f 85 c3 03 00 00 49 8d 7f 18 48 89 f8 48 c1 e8 03 42 80 3c 28 00 0f 85 23 18 00 00 49 83 7f 18 00 0f 84 22 fe ff ff <0f> 0b 48 85 c0 0f 85 43 02 00 00 49 83 e6 fe 48 b8 00 00 00 00 00 [ 503.311356] RSP: 0018:ffff88011561eb58 EFLAGS: 00010202 [ 503.317328] RAX: 1ffff1002142e37b RBX: ffff8801062c16c0 RCX: 0000000000000004 [ 503.325438] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88010a171bd8 [ 503.333574] RBP: ffffed0022ac3d89 R08: ffffed002142e395 R09: ffffed002142e395 [ 503.341686] R10: 0000000000000001 R11: ffffed002142e394 R12: 0000000000000028 [ 503.349803] R13: dffffc0000000000 R14: ffff88010a171ca4 R15: ffff88010a171bc0 [ 503.357916] FS: 0000000000000000(0000) GS:ffff880116600000(0000) knlGS:0000000000000000 [ 503.367102] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 503.373642] CR2: 0000000002452c64 CR3: 0000000105fea000 CR4: 00000000001006e0 [ 503.381803] Kernel panic - not syncing: Fatal exception in interrupt [ 503.382721] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 503.382721] Rebooting in 5 seconds..