On 04/13/2018 05:39 PM, Subash Abhinov Kasiviswanathan wrote: > We are seeing a warning followed by a crash on an ARM64 device with > Android 4.14 based kernel. > > It looks like both sk->sk_write_queue and sk->sk_send_head are NULL. > Since the sk->sk_write_queue is NULL and is dereferenced in tcp_rto_delta_us() > to get the skb->skb_mstamp, there is crash observed. > > Since this is 4.14.32, it already has commit ("tcp: reset sk_send_head in > tcp_write_queue_purge") > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.14.34&id=dbbf2d1e4077bab0c65ece2765d3fc69cf7d610f > > 12876.013077: <6> WARNING: CPU: 5 PID: 14828 at net/ipv4/tcp_output.c:2469 > tcp_send_loss_probe+0x198/0x1b8 > 12876.038939: <6> task: ffffffe73f7e5a80 task.stack: ffffff801b068000 > 12876.038941: <2> PC is at tcp_send_loss_probe+0x198/0x1b8 > 12876.038942: <2> LR is at tcp_send_loss_probe+0x28/0x1b8 > 12876.038944: <2> pc : [<ffffff8dc8db16f8>] lr : [<ffffff8dc8db1588>] > pstate: 60400145 > 12876.038944: <2> sp : ffffff800802bd30 > 12876.038945: <2> x29: ffffff800802bd50 x28: ffffff8dc9d83eb8 > 12876.038948: <2> x27: ffffff800802be08 x26: ffffff8dc9737000 > 12876.038950: <2> x25: 0000000000000001 x24: ffffffe744ea1728 > 12876.038952: <2> x23: ffffffe73f7e5a80 x22: 0000000000000558 > 12876.038954: <2> x21: 0000000000000000 x20: 0000000001080020 > 12876.038956: <2> x19: ffffffe73d06e440 x18: 0000000000000020 > 12876.038958: <2> x17: 0000000000000014 x16: 0000000000000030 > 12876.038960: <2> x15: 0000000000000000 x14: 0000000000000000 > 12876.038962: <2> x13: 0000000013af314c x12: 0000002773f8a550 > 12876.038965: <2> x11: 0000000000000538 x10: 0000000000000000 > 12876.038967: <2> x9 : 0000000000000020 x8 : ffffffe73d06e5f0 > 12876.038969: <2> x7 : 0000000000924278 x6 : ffffffe76fe9ed80 > 12876.038971: <2> x5 : ffffffe76fe9ed80 x4 : 000000000f500458 > 12876.038973: <2> x3 : ffffff800802bce0 x2 : ffffff800802bce8 > 12876.038975: <2> x1 : 0000000000000000 x0 : 0000000000000558 > 12876.039082: <2> [<ffffff8dc8db16f8>] tcp_send_loss_probe+0x198/0x1b8 > 12876.039084: <2> [<ffffff8dc8db6698>] tcp_write_timer_handler+0xf8/0x1c4 > 12876.039086: <2> [<ffffff8dc8db68cc>] tcp_write_timer+0x5c/0x98 > 12876.039089: <2> [<ffffff8dc8144f10>] call_timer_fn+0xc0/0x1b4 > 12876.039091: <2> [<ffffff8dc8143f68>] run_timer_softirq+0x230/0x850 > 12876.039094: <2> [<ffffff8dc8081b74>] __do_softirq+0x1dc/0x3a4 > 12876.039096: <2> [<ffffff8dc80b8fec>] irq_exit+0xc8/0xd4 > 12876.039098: <2> [<ffffff8dc812a004>] __handle_domain_irq+0x8c/0xc4 > 12876.039099: <2> [<ffffff8dc8081940>] gic_handle_irq+0x164/0x1bc > > [net/ipv4/tcp_output.c] > void tcp_send_loss_probe(struct sock *sk) > { > struct tcp_sock *tp = tcp_sk(sk); > struct sk_buff *skb; > int pcount; > int mss = tcp_current_mss(sk); > > ... > > /* Retransmit last segment. */ > if (WARN_ON(!skb)) > goto rearm_timer; > > 12876.043967: <6> Unable to handle kernel NULL pointer dereference at > virtual address 00000010 > 12876.091600: <6> Internal error: Oops: 96000005 [#1] PREEMPT SMP > 12876.152597: <2> PC is at tcp_rearm_rto+0x48/0x90 > 12876.156979: <2> LR is at tcp_send_loss_probe+0x178/0x1b8 > 12876.162077: <2> pc : [<ffffff8dc8da76c4>] lr : [<ffffff8dc8db16d8>] > pstate: 60400145 > 12876.169657: <2> sp : ffffff800802bd10 > 12876.173056: <2> x29: ffffff800802bd20 x28: ffffff8dc9d83eb8 > 12876.178511: <2> x27: ffffff800802be08 x26: ffffff8dc9737000 > 12876.183967: <2> x25: 0000000000000001 x24: ffffffe744ea1728 > 12876.189418: <2> x23: ffffffe73f7e5a80 x22: 0000000000000558 > 12876.194863: <2> x21: 0000000000000000 x20: 0000000001080020 > 12876.200312: <2> x19: ffffffe73d06e440 x18: 0000000000000020 > 12876.205758: <2> x17: 0000000000000014 x16: 0000000000000030 > 12876.211212: <2> x15: 0000000000000000 x14: 0000000000000000 > 12876.216660: <2> x13: 0000000013af314c x12: 0000002773f8a550 > 12876.222108: <2> x11: 0000000000000538 x10: 0000000000000000 > 12876.227561: <2> x9 : 0000000000000000 x8 : ffffffe73d06e5f0 > 12876.233008: <2> x7 : 0000000000924278 x6 : ffffffe76fe9ed80 > 12876.238455: <2> x5 : ffffffe76fe9ed80 x4 : 000000000f500458 > 12876.243907: <2> x3 : ffffff800802bce0 x2 : ffffff800802bce8 > 12876.249360: <2> x1 : 0000000000000000 x0 : 0000000000000867 > 12876.473522: <2> [<ffffff8dc8da76c4>] tcp_rearm_rto+0x48/0x90 > 12876.478971: <2> [<ffffff8dc8db16d8>] tcp_send_loss_probe+0x178/0x1b8 > 12876.485131: <2> [<ffffff8dc8db6698>] tcp_write_timer_handler+0xf8/0x1c4 > 12876.491557: <2> [<ffffff8dc8db68cc>] tcp_write_timer+0x5c/0x98 > 12876.497189: <2> [<ffffff8dc8144f10>] call_timer_fn+0xc0/0x1b4 > 12876.502731: <2> [<ffffff8dc8143f68>] run_timer_softirq+0x230/0x850 > 12876.508716: <2> [<ffffff8dc8081b74>] __do_softirq+0x1dc/0x3a4 > 12876.514260: <2> [<ffffff8dc80b8fec>] irq_exit+0xc8/0xd4 > 12876.519261: <2> [<ffffff8dc812a004>] __handle_domain_irq+0x8c/0xc4 > 12876.525245: <2> [<ffffff8dc8081940>] gic_handle_irq+0x164/0x1bc > > [net/ipv4/tcp_input.c] > void tcp_rearm_rto(struct sock *sk) > { > ... > inet_csk_clear_xmit_timer(sk, ICSK_TIME_RETRANS); > } else { > u32 rto = inet_csk(sk)->icsk_rto; > /* Offset the time elapsed after installing regular RTO */ > if (icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT || > icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) { > s64 delta_us = tcp_rto_delta_us(sk); > /* delta_us may not be positive if the socket is locked > * when the retrans timer fires and is rescheduled. > */ > rto = usecs_to_jiffies(max_t(int, delta_us, 1)); > > > [include/net/tcp.h] > static inline s64 tcp_rto_delta_us(const struct sock *sk) > { > const struct sk_buff *skb = tcp_write_queue_head(sk); > u32 rto = inet_csk(sk)->icsk_rto; > u64 rto_time_stamp_us = skb->skb_mstamp + jiffies_to_usecs(rto); > > return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp; > } >
We probably need to clear tp->packets_out :/