On Sun, Oct 21, 2012 at 03:13:56PM +0300, Konstantin Belousov wrote: > On Sat, Oct 20, 2012 at 07:10:19AM -0700, David Wolfskill wrote: > > This seems ... fairly weird to me. > > > > Yesterday, I built & booted: > > > > FreeBSD g1-227.catwhisker.org 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #274 > > 241726M: Fri Oct 19 05:40:05 PDT 2012 > > r...@g1-227.catwhisker.org:/usr/obj/usr/src/sys/CANARY i386 > > > > and used the machine all day; nothing unusual (including various > > reboots (e.g. when I disembarked the train for the final leg of my > > commute home, so I powered the laptop off). > > > > This morning, I built: > > > > FreeBSD g1-227.catwhisker.org 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #275 > > 241776M: Sat Oct 20 04:34:45 PDT 2012 > > r...@g1-227.catwhisker.org:/usr/obj/usr/src/sys/CANARY i386 > > > > and on first reboot, I got a panic. > > > > After a bit of experimentation, it appears that I get a panic @r241776 > > if I attempt a normal boot into multi-user mode, but if I first boot to > > single-user mode, then exit single-user mode, it comes up without a > > problem. > > > > I don't have a serial console, so I started to write down some of the > > panic information, but my patience ran a bit short. Here's whet I > > recorded (warning: hand-transcripted -- twice!): > > > > ... > > Starting devd. > > REDZONE: Buffer underflow detected. 1 byte corrupted before 0xced40080 > > (4294966796 bytes allocated). > > Allocation backtrace: > > #0 0xc0ceac8f at redzone_setup+0xcf > > #1 0xc0a5d5c9 at malloc+0x1d9 > > ...[about 20 more such lines I didn't record]... > > > > > bt > > Tracing pid 901 tid 100106 td 0xd2b99000 > > kdb_enter(...) > > panic(...) > > free(...) > > devread(ce8c2d00,f7274c0c,0,c0b1e4f0,d279e380,...) at devread+0x1a6 > > giant_read(...) at giant_read+0x87 > > devfs_read(...) at devfs_read+0xc6 > > dofileread(...) at dofileread+0x99 > > sys_read(...) at sys_read+0x98 > > syscall(f7274d08) at syscall+0x387 > > > > Within the bounds described above, this appears to be quite reproducible > > -- on my laptop. My build machine (updated in parallel, at the same > > GRNs) does not exhibit the panic. > > > > I was unable to get a crash dump; I have > > > > dumpdev="AUTO" > > > > in /etc/rc.conf, and the panic was occurring well after swap was > > enabled. (Yes, I know I have swap over-allocated. I plan to do > > something about it at some point.) > > > > I've attached a copy of dmesg.boot. > > > > Anyone else seeing this? Any ideas how to diagnose it? > > devread is the method of devctl(4) which passes devd notifications from > the kernel to userland (to devd, specifically). There were no changes to > devctl(4) for quite a time. > > The corruption is, most likely, in some unrelated piece of code. Could > you try to bisect the stable to catch the offender ? The bisect is not > guaranteed to work, obviously, since the random corruption effects are > unpredictable.
[Lack of trimming is deliberate, in this case, as I found a reversion that appears to address the issue, and I wanted folks looking at this to have the bulk of the symptoms readily at hand. -- dhw] The range of GRNs in question is 241726 - 241776, only 5 of which appliy to stable/9. Here's a list, with the affected files listed: 241742 sys/dev/sound/pci/hda/hdaa_patches.c 241749 sys/cam/cam_queue.c 241762 sys/dev/tws/tws.c sys/dev/tws/tws.h sys/dev/tws/tws_cam.c sys/dev/tws/tws_hdm.h sys/dev/tws/tws_user.c 241767 usr.bin/make/var.c 241769 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c I had actually tried reverting 241742 yesterday, to no effect. I don't use ZFS, and I have a pretty hard time understanding how 241767 would break one machine and leave 4 others unscathed. (Yes, I completed my weekly updates, as well, by now.) I don't have tws(4) devices -- certainly not on the laptop. So I tried reverting 241749 ... and I failed to reproduce the problem. Well, one boot out of one, at least. I'll try a few more reality checks, and report back if a correction is in order. But (for now, at least), it looks to me as if 241749 is presenting a problem on this laptop. For folks investigating, I attached a dmesg.boot to the initial post in the thread; I'll be happy to provide more information, should it be requested (& specified). Peace, david -- David H. Wolfskill da...@catwhisker.org Taliban: Evil men with guns afraid of truth from a 14-year old girl. See http://www.catwhisker.org/~david/publickey.gpg for my public key.
pgpPQgu5Io9Pg.pgp
Description: PGP signature