Re: Kernel panics in tcp_twclose

Julien Charbon Mon, 28 Sep 2015 01:24:08 -0700

 Hi Palle,

On 25/09/15 16:14, Palle Girgensohn wrote:
>> 24 sep 2015 kl. 11:39 skrev Palle Girgensohn <gir...@freebsd.org>:
>>> 24 sep 2015 kl. 09:57 skrev Julien Charbon <j...@freebsd.org>: On
>>> 24/09/15 09:03, Julien Charbon wrote:
>>>> On 24/09/15 08:55, Palle Girgensohn wrote:
>>>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn 
>>>>>> <gir...@pingpong.net>:
>>>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn 
>>>>>>> <gir...@pingpong.net>:
>>>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon
>>>>>>>> <j...@freebsd.org>: On 23/09/15 20:26, Palle Girgensohn
>>>>>>>> wrote:
>>>>>>> Kernels and userland are updated to 10.2-p3 with the
>>>>>>> patch removing the suspicous KASSERT. dtrace running
>>>>>>> continously redirecting to a log file.
>>>>> Just had a crash. Unfortunately, the kernel was stuck at the
>>>>> db> prompt, and the remote keyboard was unresponsive (HP ILO,
>>>>> not impressed). So I had to reset the power and never got a
>>>>> core dump...
>>>>> 
>>>>> panic: tcp_tw_2msl_stop: inp should not be released here 
>>>>> cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at
>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe175acd16a0
>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame 
>>>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame
>>>>> 0xfffffe175acd1790 kassert_panic() at
>>>>> kassert_panic+0x139/frame 0xfffffe175acd1800 tcp_twclose() at
>>>>> tcp_twclose+0x2cb/frame 0xfffffe175acd1850 tcp_tw_2msl_scan()
>>>>> at tcp_tw_2msl_scan+0x13b/frame 0xfffffe175acd1890
>>>>> tcp_slowtimo() at tcp_slowtimo+0x68/frame 0xfffffe175acd18c0
>>>>> pfslowtimo() at pfslowtimo+0x54/frame 0xfffffe175acd18f0
>>>>> softclock_call_cc() at softclock_call_cc+0x193/frame
>>>>> 0xfffffe175acd19d0 softclock() at softclock+0x47/frame
>>>>> 0xfffffe175acd19f0 intr_event_execute_handlers() at
>>>>> intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 
>>>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 
>>>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 
>>>>> fork_trampoline() at fork_trampoline+0xe/frame
>>>>> 0xfffffe175acd1ab0 --- trap 0, rip = 0, rsp =
>>>>> 0xfffffe175acd1b70, rbp = 0 --- KDB: enter: panic [ thread
>>>>> pid 12 tid 100043 ] Stopped at      kdb_enter+0x3e: movq
>>>>> $0,kdb_why db>
>>>> 
>>>> Thanks a log for this backstrace.  This is what at expected,
>>>> when tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can
>>>> be called one extra time that leads to:
>>>> 
>>>> tcp_tw_2msl_stop: inp should not be released here
>>>> 
>>>> Let me try to come with a tentative fix for this case.
>>> 
>>> See joined my tentative patch for these case.  It is only a
>>> first tentative patch as I am still waiting on -net feedbacks on
>>> what should be the rule here.
>>> 
>>> By the way:
>>> 
>>> - I see nothing specific to VIMAGE here
>>> 
>>> - Anyone aware of tcp_close() (or tcp_drop()) calls
>>> modified/introduced recently in 10.2 that could explained why
>>> this issue only appears only now?
>> 
>> Running a machine with the patch now (it just crashed and rebooted
>> with the new kernel).
>> 
>> Hoping it will have a "soothing" effect... ;-)
>> 
>> dtrace running as previously. No output yet, though.
> 
> First of, loud cheers and a big *thank you* to Julien for helping us
> get our systems to stop crashing. This really means a lot to us!
> Thank you!


 Glab to see your system more stable now.  You are welcome, thanks to
you for reporting this issue with accuracy.  We got lucky than it took
/only/ three different kernel panics to get a good overview.  This part
of the code being quite tricky as you have three entangled layers that
tries to clean up theirs things the right way: socket, inp and tcptw.

> Dtrace still shows nothing.

 I will try to provide you more generic Dtrace script, it seem the
current one is too specific.

--
Julien

signature.asc
Description: OpenPGP digital signature

Re: Kernel panics in tcp_twclose

Reply via email to