On Jan 15, 2013, at 5:07 PM, Steven Hartland wrote: > > ----- Original Message ----- From: <dte...@freebsd.org> > To: "'Ian Lepore'" <free...@damnhippie.dyndns.org> > Cc: <freebsd-hackers@freebsd.org>; <dte...@freebsd.org> > Sent: Wednesday, January 16, 2013 12:56 AM > Subject: RE: kgzip(1) is broken > > >>> -----Original Message----- >>> From: Ian Lepore [mailto:free...@damnhippie.dyndns.org] >>> Sent: Tuesday, January 15, 2013 4:43 PM >>> To: Devin Teske >>> Cc: dte...@freebsd.org; freebsd-hackers@freebsd.org >>> Subject: RE: kgzip(1) is broken >>> On Tue, 2013-01-15 at 16:10 -0800, Devin Teske wrote: >>>> >>>>> -----Original Message----- >>>>> From: Devin Teske [mailto:devin.te...@fisglobal.com] On Behalf Of >>>>> dte...@freebsd.org >>>>> Sent: Tuesday, January 15, 2013 3:10 PM >>>>> To: 'Ian Lepore' >>>>> Cc: freebsd-hackers@freebsd.org; dte...@freebsd.org >>>>> Subject: RE: kgzip(1) is broken >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Ian Lepore [mailto:free...@damnhippie.dyndns.org] >>>>>> Sent: Tuesday, January 15, 2013 3:05 PM >>>>>> To: dte...@freebsd.org >>>>>> Cc: freebsd-hackers@freebsd.org >>>>>> Subject: Re: kgzip(1) is broken >>>>>> >>>>>> On Tue, 2013-01-15 at 13:27 -0800, dte...@freebsd.org wrote: >>>>>>> Hello, >>>>>>> >>>>>>> I have been sad of-late because kgzip(1) no longer produces a usable >>>> kernel. >>>>>>> >>>>>>> All versions of 9.x suffer this. >>>>>>> >>>>>>> And somewhere between 8.3-RELEASE-p1 and 8.3-RELEASE-p5 this >>> recently >>>>>> broke in >>>>>>> the 8.x series. >>>>>>> >>>>>>> I haven't tried the 7 series lately, but if whatever is making the >> rounds >>>>> gets >>>>>>> MFC'd that far back, I expect the problem to percolate there too. >>>>>>> >>>>>>> The symptom is that the machine reboots immediately and unexpectedly >>> the >>>>>> moment >>>>>>> the kernel is executed by the loader. >>>>>>> >>>>>>> This is quite troubling and I am looking for someone to help find the >>>>> culprit. I >>>>>>> don't know where to start looking. >>>>>> >>>>>> Here are some possible candidates from the things that were MFC'd to 8 >>>>>> in that timeframe. I haven't looked at what these do, they're just >>>>>> changes that affect files related to booting. >>>>>> >>>>>> r233211 >>>>>> r233377 >>>>>> r233469 >>>>>> r234563 >>>>>> >>>>> >>>>> Thanks Ian! >>>>> >>>>> I'll test each one individually to see if regressing any one (or all) >>>> addresses >>>>> the problem. >>>> >>>> Progress... >>>> >>>> Looks like I found the culprit. >>>> >>>> Turns out it's a back-ported bxe(4) driver (back-ported from 9 -- where >> kgzip >>>> seems to never work). >>>> >>>> I wonder why back-porting bxe(4) from stable/9 to releng/8.3 would cause >>> kgzip >>>> to produce non-working kernels. >>>> >>> Yeah, it'll be interesting to see how a device driver can lead to "the >>> machine reboots immediately and unexpectedly the moment the kernel is >>> executed by the loader," which I took to mean "before seeing the >>> copyright or anything." >> Indeed... loader throws up the syms and upon execution *KABOOM* (screen goes >> black and back to POST) >> The copyright never appears. >>>> I'm emailing the maintainers (davidch + other Broadcom folk) >> The current dossier is even more interesting... the back-ported driver (with >> zero modifications mind you from stable/9 to stable/8) exhibits memory >> failures >> (example below), and causes terminals to become wedged when attempting to >> (for >> example) scp a file over an existing configured network (igb-based -- >> presumably >> unrelated to bxe but in practice loading bxe causes igb to misbehave). >> $ ifconfig bxe0 inet 192.168.1.5/24 >> bxe0: ../../../dev/bxe/if_bxe.c(10939): Memory allocation failure! Cannot >> fill >> fp[00] RX chain. >> bxe0: ../../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting! >> $ ifconfig bxe1 inet 192.168.1.6/24 >> bxe1: ../../../dev/bxe/if_bxe.c(10939): Memory allocation failure! Cannot >> fill >> fp[00] RX chain. >> bxe1: ../../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting! >> (as expected, also sent mail off to maintainers w/respect to above >> notes/errors) > > Sounds like you may be out of mbufs which is easy, on a box with 4 igb's > simply > booting without tuning with cause this so, if you have igb's and bxe's this > could be your cause. > > Try adding the following to loader.conf and see if it helps:- > kern.ipc.nmbclusters=51200 >
Sorry for delayed response -- we had to go through a power cycle. I haven't yet tried bumping the value as suggested, but I suspect it will indeed help greatly -- I noticed that I got 18% into the scp before things took a dive for the worse (hanging terminals and such). Another thing worth noting about the uplifted bxe(4) plopped into RELENG_8… when we rebooted: bxe0: ../../../dev/bxe/if_bxe.c(6419): Slowpath queue is full! bxe0: ---------- Begin crash dump ---------- bxe0: ---------- End crash dump ---------- bxe0: ../../../dev/bxe/if_bxe.c(6419): Slowpath queue is full! bxe0: ---------- Begin crash dump ---------- bxe0: ---------- End crash dump ---------- bxe0: ../../../dev/bxe/if_bxe.c(3262): fp[01] client ramrod halt failed! Heh. The machine had to be hard cycled. -- Devin _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"