> 25 sep 2015 kl. 16:19 skrev Palle Girgensohn <gir...@freebsd.org>: > >> >> 25 sep 2015 kl. 16:14 skrev Palle Girgensohn <gir...@freebsd.org>: >> >>> >>> 24 sep 2015 kl. 11:39 skrev Palle Girgensohn <gir...@freebsd.org>: >>> >>> >>>> 24 sep 2015 kl. 09:57 skrev Julien Charbon <j...@freebsd.org>: >>>> >>>> >>>> Hi -net, >>>> >>>> On 24/09/15 09:03, Julien Charbon wrote: >>>>> On 24/09/15 08:55, Palle Girgensohn wrote: >>>>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >>>>>>> <gir...@pingpong.net>: >>>>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>>>>>>> <gir...@pingpong.net>: >>>>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon <j...@freebsd.org>: >>>>>>>>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>>>>>> Kernels and userland are updated to 10.2-p3 with the patch >>>>>>>> removing the suspicous KASSERT. >>>>>>>> dtrace running continously redirecting to a log file. >>>>>> Just had a crash. Unfortunately, the kernel was stuck at the db> >>>>>> prompt, and the remote keyboard was unresponsive (HP ILO, not >>>>>> impressed). So I had to reset the power and never got a core dump... >>>>>> >>>>>> panic: tcp_tw_2msl_stop: inp should not be released here >>>>>> cpuid = 0 >>>>>> KDB: stack backtrace: >>>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>>>>> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame >>>>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790 >>>>>> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 >>>>>> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 >>>>>> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame >>>>>> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame >>>>>> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame >>>>>> 0xfffffe175acd18f0 softclock_call_cc() at >>>>>> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at >>>>>> softclock+0x47/frame 0xfffffe175acd19f0 intr_event_execute_handlers() >>>>>> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 >>>>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 >>>>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 >>>>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 >>>>>> --- trap 0, rip = 0, rsp = 0xfffffe175acd1b70, rbp = 0 --- >>>>>> KDB: enter: panic >>>>>> [ thread pid 12 tid 100043 ] >>>>>> Stopped at kdb_enter+0x3e: movq $0,kdb_why >>>>>> db> >>>>> >>>>> Thanks a log for this backstrace. This is what at expected, when >>>>> tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called one >>>>> extra time that leads to: >>>>> >>>>> tcp_tw_2msl_stop: inp should not be released here >>>>> >>>>> Let me try to come with a tentative fix for this case. >>>> >>>> See joined my tentative patch for these case. It is only a first >>>> tentative patch as I am still waiting on -net feedbacks on what should >>>> be the rule here. >>>> >>>> By the way: >>>> >>>> - I see nothing specific to VIMAGE here >>>> >>>> - Anyone aware of tcp_close() (or tcp_drop()) calls modified/introduced >>>> recently in 10.2 that could explained why this issue only appears only now? >>>> >>>> -- >>>> Julien >>>> <tcp-close-fix-v1.patch> >>> >>> >>> Running a machine with the patch now (it just crashed and rebooted with the >>> new kernel). >>> >>> Hoping it will have a "soothing" effect... ;-) >>> >>> >>> dtrace running as previously. No output yet, though. >>> >>> >> >> Hello -net & Julien! >> >> First of, loud cheers and a big *thank you* to Julien for helping us get our >> systems to stop crashing. This really means a lot to us! Thank you! >> >> We have been running more than 24 hours with no crash, so I'm getting more >> and more confident that the change acually makes the system stable. >> >> Dtrace still shows nothing. >> >> Palle > > > Secondly, is this error related? This is *not* VIMAGE, *not* jail. It is a > binary installed GENERIC from freebsd-update. 10.1-RELEASE-p19. It just > crashed today, and we did not get any core dump, but I found this core.txt > from a crash in August that I was not aware of (I was on holiday then... :) > > Since it is installed binary, I have no kernel.debug. > > ... > > panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 > clashing > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 > clashing > cpuid = 1 > KDB: stack backtrace: > #0 0xffffffff80963000 at kdb_backtrace+0x60 > #1 0xffffffff80928125 at panic+0x155 > #2 0xffffffff8099c180 at sbdroprecord_locked+0 > #3 0xffffffff80ac8c9c at tcp_output+0xdbc > #4 0xffffffff80ac6a95 at tcp_do_segment+0x3045 > #5 0xffffffff80ac2e04 at tcp_input+0xd04 > #6 0xffffffff80a54fc7 at ip_input+0x97 > #7 0xffffffff809f4f73 at swi_net+0x143 > #8 0xffffffff808faf4b at intr_event_execute_handlers+0xab > #9 0xffffffff808fb396 at ithread_loop+0x96 > #10 0xffffffff808f8b6a at fork_exit+0x9a > #11 0xffffffff80d0b67e at fork_trampoline+0xe > Uptime: 21d0h54m53s > Dumping 2005 out of 32709 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > Reading symbols from /boot/kernel/accf_data.ko.symbols...done. > Loaded symbols for /boot/kernel/accf_data.ko.symbols > Reading symbols from /boot/kernel/accf_http.ko.symbols...done. > Loaded symbols for /boot/kernel/accf_http.ko.symbols > Reading symbols from /boot/kernel/oce.ko.symbols...done. > Loaded symbols for /boot/kernel/oce.ko.symbols > Reading symbols from /boot/kernel/nullfs.ko.symbols...done. > Loaded symbols for /boot/kernel/nullfs.ko.symbols > Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. > Loaded symbols for /boot/kernel/linprocfs.ko.symbols > Reading symbols from /boot/kernel/linux.ko.symbols...done. > Loaded symbols for /boot/kernel/linux.ko.symbols > Reading symbols from /boot/kernel/zfs.ko.symbols...done. > Loaded symbols for /boot/kernel/zfs.ko.symbols > Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. > Loaded symbols for /boot/kernel/opensolaris.ko.symbols > #0 doadump (textdump=<value optimized out>) at pcpu.h:219 > 219 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) #0 doadump (textdump=<value optimized out>) at pcpu.h:219 > #1 0xffffffff80927da2 in kern_reboot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:452 > #2 0xffffffff80928164 in panic (fmt=<value optimized out>) > at /usr/src/sys/kern/kern_shutdown.c:759 > #3 0xffffffff8099c180 in sbsndptr (sb=<value optimized out>, > off=<value optimized out>, len=<value optimized out>, > moff=<value optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1011 > #4 0xffffffff80ac8c9c in tcp_output (tp=0xfffff80312ef5800) > at /usr/src/sys/netinet/tcp_output.c:870 > #5 0xffffffff80ac6a95 in tcp_do_segment (m=<value optimized out>, > th=<value optimized out>, so=<value optimized out>, > tp=<value optimized out>, drop_hdrlen=<value optimized out>, tlen=0, > iptos=<value optimized out>, ti_locked=Cannot access memory at address 0x1 > ) > at /usr/src/sys/netinet/tcp_input.c:3018 > #6 0xffffffff80ac2e04 in tcp_input (m=<value optimized out>, > off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1377 > #7 0xffffffff80a54fc7 in ip_input (m=0xfffff800b4516600) > at /usr/src/sys/netinet/ip_input.c:734 > #8 0xffffffff809f4f73 in swi_net (arg=0xffffffff81988880) > at /usr/src/sys/net/netisr.c:765 > #9 0xffffffff808faf4b in intr_event_execute_handlers ( > p=<value optimized out>, ie=0xfffff800093ac600) > at /usr/src/sys/kern/kern_intr.c:1263 > #10 0xffffffff808fb396 in ithread_loop (arg=0xfffff80009388e40) > at /usr/src/sys/kern/kern_intr.c:1276 > #11 0xffffffff808f8b6a in fork_exit ( > callout=0xffffffff808fb300 <ithread_loop>, arg=0xfffff80009388e40, > frame=0xfffffe083c3e3ac0) at /usr/src/sys/kern/kern_fork.c:996 > #12 0xffffffff80d0b67e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:606 > #13 0x0000000000000000 in ?? () > Current language: auto; currently minimal > (kgdb)
Hi Julien and -net, A sunny Monday, no crashes since the patch was applied. Great! Big thanks again! We still have nothing in the dtrace log, though. And I wonder if the above crash could possibly be a result of hitting that same bug? Palle _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"