Hi Palle, On 25/09/15 16:14, Palle Girgensohn wrote: >> 24 sep 2015 kl. 11:39 skrev Palle Girgensohn <gir...@freebsd.org>: >>> 24 sep 2015 kl. 09:57 skrev Julien Charbon <j...@freebsd.org>: On >>> 24/09/15 09:03, Julien Charbon wrote: >>>> On 24/09/15 08:55, Palle Girgensohn wrote: >>>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >>>>>> <gir...@pingpong.net>: >>>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>>>>>> <gir...@pingpong.net>: >>>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon >>>>>>>> <j...@freebsd.org>: On 23/09/15 20:26, Palle Girgensohn >>>>>>>> wrote: >>>>>>> Kernels and userland are updated to 10.2-p3 with the >>>>>>> patch removing the suspicous KASSERT. dtrace running >>>>>>> continously redirecting to a log file. >>>>> Just had a crash. Unfortunately, the kernel was stuck at the >>>>> db> prompt, and the remote keyboard was unresponsive (HP ILO, >>>>> not impressed). So I had to reset the power and never got a >>>>> core dump... >>>>> >>>>> panic: tcp_tw_2msl_stop: inp should not be released here >>>>> cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at >>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe175acd16a0 >>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame >>>>> 0xfffffe175acd1790 kassert_panic() at >>>>> kassert_panic+0x139/frame 0xfffffe175acd1800 tcp_twclose() at >>>>> tcp_twclose+0x2cb/frame 0xfffffe175acd1850 tcp_tw_2msl_scan() >>>>> at tcp_tw_2msl_scan+0x13b/frame 0xfffffe175acd1890 >>>>> tcp_slowtimo() at tcp_slowtimo+0x68/frame 0xfffffe175acd18c0 >>>>> pfslowtimo() at pfslowtimo+0x54/frame 0xfffffe175acd18f0 >>>>> softclock_call_cc() at softclock_call_cc+0x193/frame >>>>> 0xfffffe175acd19d0 softclock() at softclock+0x47/frame >>>>> 0xfffffe175acd19f0 intr_event_execute_handlers() at >>>>> intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 >>>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 >>>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 >>>>> fork_trampoline() at fork_trampoline+0xe/frame >>>>> 0xfffffe175acd1ab0 --- trap 0, rip = 0, rsp = >>>>> 0xfffffe175acd1b70, rbp = 0 --- KDB: enter: panic [ thread >>>>> pid 12 tid 100043 ] Stopped at kdb_enter+0x3e: movq >>>>> $0,kdb_why db> >>>> >>>> Thanks a log for this backstrace. This is what at expected, >>>> when tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can >>>> be called one extra time that leads to: >>>> >>>> tcp_tw_2msl_stop: inp should not be released here >>>> >>>> Let me try to come with a tentative fix for this case. >>> >>> See joined my tentative patch for these case. It is only a >>> first tentative patch as I am still waiting on -net feedbacks on >>> what should be the rule here. >>> >>> By the way: >>> >>> - I see nothing specific to VIMAGE here >>> >>> - Anyone aware of tcp_close() (or tcp_drop()) calls >>> modified/introduced recently in 10.2 that could explained why >>> this issue only appears only now? >> >> Running a machine with the patch now (it just crashed and rebooted >> with the new kernel). >> >> Hoping it will have a "soothing" effect... ;-) >> >> dtrace running as previously. No output yet, though. > > First of, loud cheers and a big *thank you* to Julien for helping us > get our systems to stop crashing. This really means a lot to us! > Thank you!
Glab to see your system more stable now. You are welcome, thanks to you for reporting this issue with accuracy. We got lucky than it took /only/ three different kernel panics to get a good overview. This part of the code being quite tricky as you have three entangled layers that tries to clean up theirs things the right way: socket, inp and tcptw. > Dtrace still shows nothing. I will try to provide you more generic Dtrace script, it seem the current one is too specific. -- Julien
signature.asc
Description: OpenPGP digital signature