I have filed a PR with this patch so that it doesn't get overlooked. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=216731
-jr On Thu, 26 Jan 2017 10:20:17 -0500 "J.R. Oldroyd" <f...@opal.com> wrote: > > Sorry for the time gap, I had to deal with family matters. > > OK, I patched if_lagg.c to drop and re-acquire the lock around > the call to init the underlying driver. I've been running this > for some weeks now and haven't seen the boot-hang since. Hopefully > I have tested long enough. > > Someone more familiar with this driver and use of this lock there > should review this patch and comment. > > -jr > > > Index: sys/net/if_lagg.c > =================================================================== > --- sys/net/if_lagg.c (revision 307319) > +++ sys/net/if_lagg.c (working copy) > @@ -995,6 +995,21 @@ > LAGG_RUNLOCK(sc, &tracker); > break; > > + case SIOCADDMULTI: > + case SIOCDELMULTI: > + /* > + * Drivers like if_re.c cause a LOR on WLOCK, so we must > + * drop and re-aquire the lock around the call. > + */ > + if (lp->lp_ioctl == NULL) { > + error = EINVAL; > + break; > + } > + LAGG_WUNLOCK(sc); > + error = (*lp->lp_ioctl)(ifp, cmd, data); > + LAGG_WLOCK(sc); > + break; > + > case SIOCSIFCAP: > if (lp->lp_ioctl == NULL) { > error = EINVAL; > > > On Wed, 28 Dec 2016 00:24:09 -0800 Adrian Chadd <adrian.ch...@gmail.com> > wrote: > > > > hi, > > > > yes, the LOR is why the boot hang occurs :( > > > > > > > > -a > > > > > > On 27 December 2016 at 14:30, J.R. Oldroyd <f...@opal.com> wrote: > > > Sorry, Adrian, I'm missing the back-story here and I'm not that > > > familiar with the lagg code. > > > > > > Are you saying that this LOR is likely relevant to this boot hang, > > > or are you saying that this is a known problem that's not relevant? > > > > > > Jan Kokemüller posted some lagg patches. I don't know if they are > > > likely applicable to this problem, but I could try those. > > > > > > https://reviews.freebsd.org/D6845 > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211689#c4 > > > > > > The first removes an RLOCK, but not the one referenced in the LOR > > > report. The second is a patch for the ath/iwm panic. If you're > > > unfamiliar with them, I will study up on this code and patches > > > to get up to speed on it. > > > > > > -jr > > > > > > > > > On Fri, 23 Dec 2016 11:41:33 -0800 Adrian Chadd <adrian.ch...@gmail.com> > > > wrote: > > >> > > >> Right, that's the known lock order issue with lagg. :( > > >> > > >> > > >> -adrian > > >> > > >> > > >> On 23 December 2016 at 11:37, J.R. Oldroyd <f...@opal.com> wrote: > > >> > On Fri, 23 Dec 2016 10:17:34 -0800 Adrian Chadd > > >> > <adrian.ch...@gmail.com> wrote: > > >> >> > > >> >> On 20 December 2016 at 08:18, J.R. Oldroyd <f...@opal.com> wrote: > > >> >> > On Thu, 8 Dec 2016 17:19:26 -0500 "J.R. Oldroyd" <f...@opal.com> > > >> >> > wrote: > > >> >> >> > > >> >> >> On Thu, 08 Dec 2016 21:29:32 +0200 "Andriy Voskoboinyk" > > >> >> >> <s3er...@gmail.com> wrote: > > >> >> >> > > > >> >> >> > Thu, 08 Dec 2016 16:57:19 +0200 було написано J.R. Oldroyd > > >> >> >> > <f...@opal.com>: > > >> >> >> > > > >> >> >> > Is there any additional output with > > >> >> >> > wlandebug_wlan0="scan+state+auth+assoc" > > >> >> >> > in /etc/rc.conf ? > > >> >> >> > > > >> >> >> > > >> >> >> I have put that in and rebooted several times, all times OK. > > >> >> >> I will report back again in due course when it next hangs. > > >> >> >> > > >> >> >> -jr > > >> >> >> > > >> >> > > > >> >> > The boot hang occurred again today. I noted the point of the hang > > >> >> > and > > >> >> > rebooted; the log from the good boot with annotation of the > > >> >> > previous hang > > >> >> > point is here [1]. > > >> >> > > > >> >> > -jr > > >> >> > > > >> >> > [1] > > >> >> > http://opal.com/jr/freebsd/20161220-fbsd11.3-boot_hang_wlan_debug.txt > > >> >> > _______________________________________________ > > >> >> > freebsd-wireless@freebsd.org mailing list > > >> >> > https://lists.freebsd.org/mailman/listinfo/freebsd-wireless > > >> >> > To unsubscribe, send any mail to > > >> >> > "freebsd-wireless-unsubscr...@freebsd.org" > > >> >> > > >> >> > > >> >> can you compile with witness and invariants? I'd like to see if its > > >> >> locking related. > > >> >> > > >> >> thanks > > >> >> > > >> >> > > >> >> -adrian > > >> >> > > >> >> > > >> > > > >> > Hmm, maybe: > > >> > > > >> > Dec 23 14:30:34 shibato kernel: wlan0: ieee80211_swscan_add_scan: chan > > >> > 11g min dwell met (2146895553 > 2146895553) > > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_mindwell: called > > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: loop start; > > >> > scandone=0 > > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: chan 11g -> > > >> > 7g [active, dwell min 20ms max 200ms] > > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan: calling; > > >> > maxdwell=200 > > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: waiting > > >> > Dec 23 14:30:34 shibato kernel: re0: link state changed to UP > > >> > Dec 23 14:30:34 shibato kernel: lagg0: link state changed to UP > > >> > Dec 23 14:30:34 shibato kernel: lock order reversal: > > >> > Dec 23 14:30:34 shibato kernel: 1st 0xfffff800095d2208 if_lagg rmlock > > >> > (if_lagg rmlock) @ > > >> > /usr/src/sys/modules/if_lagg/../../net/if_lagg.c:1530 > > >> > Dec 23 14:30:34 shibato kernel: 2nd 0xfffffe0000e10218 re0 (network > > >> > driver) @ dev/re/if_re.c:3433 > > >> > Dec 23 14:30:34 shibato kernel: stack backtrace: > > >> > Dec 23 14:30:34 shibato kernel: #0 0xffffffff80a98b60 at > > >> > witness_debugger+0x70 > > >> > Dec 23 14:30:34 shibato kernel: #1 0xffffffff80a98a54 at > > >> > witness_checkorder+0xe54 > > >> > Dec 23 14:30:34 shibato kernel: #2 0xffffffff80a1c794 at > > >> > __mtx_lock_flags+0xa4 > > >> > Dec 23 14:30:34 shibato kernel: #3 0xffffffff8078c279 at re_ioctl+0x3a9 > > >> > Dec 23 14:30:34 shibato kernel: #4 0xffffffff8222428e at > > >> > lagg_port_ioctl+0xde > > >> > Dec 23 14:30:34 shibato kernel: #5 0xffffffff80b20bbf at > > >> > if_addmulti+0x39f > > >> > Dec 23 14:30:34 shibato kernel: #6 0xffffffff82224708 at > > >> > lagg_ether_cmdmulti+0x158 > > >> > Dec 23 14:30:34 shibato kernel: #7 0xffffffff822219dd at > > >> > lagg_ioctl+0xdd > > >> > Dec 23 14:30:34 shibato kernel: #8 0xffffffff80b20bbf at > > >> > if_addmulti+0x39f > > >> > Dec 23 14:30:34 shibato kernel: #9 0xffffffff80c35a97 at > > >> > in6_mc_join_locked+0x1d7 > > >> > Dec 23 14:30:34 shibato kernel: #10 0xffffffff80c35715 at > > >> > in6_joingroup+0x75 > > >> > Dec 23 14:30:34 shibato kernel: #11 0xffffffff80c2f9e9 at > > >> > in6_update_ifa+0x1339 > > >> > Dec 23 14:30:34 shibato kernel: #12 0xffffffff80c33eb3 at > > >> > in6_ifattach+0x413 > > >> > Dec 23 14:30:34 shibato kernel: #13 0xffffffff80b1fd84 at ifioctl+0xfe4 > > >> > Dec 23 14:30:34 shibato kernel: #14 0xffffffff80a9d946 at > > >> > kern_ioctl+0x246 > > >> > Dec 23 14:30:34 shibato kernel: #15 0xffffffff80a9d691 at > > >> > sys_ioctl+0x171 > > >> > Dec 23 14:30:34 shibato kernel: #16 0xffffffff80e9d40b at > > >> > amd64_syscall+0x2db > > >> > Dec 23 14:30:34 shibato kernel: #17 0xffffffff80e7d8ab at > > >> > Xfast_syscall+0xfb > > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: loop start; > > >> > scandone=0 > > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: chan 7g -> > > >> > 36a [active, dwell min 20ms max 200ms] > > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan: calling; > > >> > maxdwell=200 > > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: waiting > > >> > > > >> > This boot then continued normally, no hang. > > >> > > > >> > -jr > > > >
pgpkppUc2fRa4.pgp
Description: OpenPGP digital signature