Re: debugging frequent kernel panics on 8.2-RELEASE
On Thursday, August 18, 2011 4:09:35 pm Andriy Gapon wrote: > on 17/08/2011 23:21 Andriy Gapon said the following: > > It seems like everything starts with some kind of a race between terminating > > processes in a jail and termination of the jail itself. This is where the > > details are very thin so far. What we see is that a process (http) is in > > exit(2) syscall, in exit1() function actually, and past the place where > > P_WEXIT > > flag is set and even past the place where p_limit is freed and reset to > > NULL. > > At that place the thread calls prison_proc_free(), which calls > > prison_deref(). > > Then, we see that in prison_deref() the thread gets a page fault because of > > what > > seems like a NULL pointer dereference. That's just the start of the > > problem and > > its root cause. > > > > Then, trap_pfault() gets invoked and, because addresses close to NULL look > > like > > userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn > > goes > > on to call vm_map_growstack. First thing that vm_map_growstack does is a > > call > > to lim_cur(), but because p_limit is already NULL, that call results in a > > NULL > > pointer dereference and a page fault. Goto the beginning of this paragraph. > > > > So we get this recursion of sorts, which only ends when a stack is > > exhausted and > > a CPU generates a double-fault. > > BTW, does anyone has an idea why the thread in question would "disappear" from > the kgdb's point of view? > > (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid > $3 = 102057 > (kgdb) tid 102057 > invalid tid > > info threads also doesn't list the thread. > > Is it because the panic happened while the thread was somewhere in exit1()? Yes, it is a bug in kgdb that it only walks allproc and not zombproc. Try this: Index: kthr.c === --- kthr.c (revision 224879) +++ kthr.c (working copy) @@ -73,11 +73,52 @@ kgdb_thr_first(void) return (first); } +static void +kgdb_thr_add_procs(uintptr_t paddr) +{ + struct proc p; + struct thread td; + struct kthr *kt; + CORE_ADDR addr; + + while (paddr != 0) { + if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) { + warnx("kvm_read: %s", kvm_geterr(kvm)); + break; + } + addr = (uintptr_t)TAILQ_FIRST(&p.p_threads); + while (addr != 0) { + if (kvm_read(kvm, addr, &td, sizeof(td)) != + sizeof(td)) { + warnx("kvm_read: %s", kvm_geterr(kvm)); + break; + } + kt = malloc(sizeof(*kt)); + kt->next = first; + kt->kaddr = addr; + if (td.td_tid == dumptid) + kt->pcb = dumppcb; + else if (td.td_state == TDS_RUNNING && stoppcbs != 0 && + CPU_ISSET(td.td_oncpu, &stopped_cpus)) + kt->pcb = (uintptr_t)stoppcbs + + sizeof(struct pcb) * td.td_oncpu; + else + kt->pcb = (uintptr_t)td.td_pcb; + kt->kstack = td.td_kstack; + kt->tid = td.td_tid; + kt->pid = p.p_pid; + kt->paddr = paddr; + kt->cpu = td.td_oncpu; + first = kt; + addr = (uintptr_t)TAILQ_NEXT(&td, td_plist); + } + paddr = (uintptr_t)LIST_NEXT(&p, p_list); + } +} + struct kthr * kgdb_thr_init(void) { - struct proc p; - struct thread td; long cpusetsize; struct kthr *kt; CORE_ADDR addr; @@ -113,37 +154,11 @@ kgdb_thr_init(void) stoppcbs = kgdb_lookup("stoppcbs"); - while (paddr != 0) { - if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) { - warnx("kvm_read: %s", kvm_geterr(kvm)); - break; - } - addr = (uintptr_t)TAILQ_FIRST(&p.p_threads); - while (addr != 0) { - if (kvm_read(kvm, addr, &td, sizeof(td)) != - sizeof(td)) { - warnx("kvm_read: %s", kvm_geterr(kvm)); - break; - } - kt = malloc(sizeof(*kt)); - kt->next = first; - kt->kaddr = addr; - if (td.td_tid == dumptid) - kt->pcb = dumppcb; - else if (td.td_state == TDS_RUNNING && stoppcbs != 0 && - CPU_ISSET(td.td_oncpu, &stopped_cpus)) - kt
Re: panic: spin lock held too long (RELENG_8 from today)
On 8/18/2011 8:37 PM, Chip Camden wrote: >> st> Thanks, Attilio. I've applied the patch and removed the extra debug >> st> options I had added (though keeping debug symbols). I'll let you know if >> st> I experience any more panics. >> >> No panic for 20 hours at this moment, FYI. For my NFS server, I >> think another 24 hours would be sufficient to confirm the stability. >> I will see how it works... >> >> -- Hiroki > > Likewise: > > $ uptime > 5:37PM up 21:45, 5 users, load averages: 0.68, 0.45, 0.63 > > So far, so good (knocks on head). > 0(ns4)% uptime 8:55AM up 22:39, 3 users, load averages: 0.01, 0.00, 0.00 0(ns4)% So far so good for me too ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: panic: spin lock held too long (RELENG_8 from today)
Quoth Mike Tancsa on Friday, 19 August 2011: > On 8/18/2011 8:37 PM, Chip Camden wrote: > > >> st> Thanks, Attilio. I've applied the patch and removed the extra debug > >> st> options I had added (though keeping debug symbols). I'll let you know > >> if > >> st> I experience any more panics. > >> > >> No panic for 20 hours at this moment, FYI. For my NFS server, I > >> think another 24 hours would be sufficient to confirm the stability. > >> I will see how it works... > >> > >> -- Hiroki > > > > Likewise: > > > > $ uptime > > 5:37PM up 21:45, 5 users, load averages: 0.68, 0.45, 0.63 > > > > So far, so good (knocks on head). > > > > > 0(ns4)% uptime > 8:55AM up 22:39, 3 users, load averages: 0.01, 0.00, 0.00 > 0(ns4)% > > > So far so good for me too > > ---Mike > > -- > --- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, m...@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" Still up and running here. 8:02AM up 1 day, 12:10, 4 users, load averages: 0.08, 0.26, 0.52 After the panics began, I never went more than 12 hours without one before applying this patch. I think you nailed it, Attilio. Or at least, you moved it. -- .O. | Sterling (Chip) Camden | http://camdensoftware.com ..O | sterl...@camdensoftware.com | http://chipsquips.com OOO | 2048R/D6DBAF91 | http://chipstips.com pgp4szrgFEc1J.pgp Description: PGP signature
Re: debugging frequent kernel panics on 8.2-RELEASE
on 19/08/2011 15:14 John Baldwin said the following: > Yes, it is a bug in kgdb that it only walks allproc and not zombproc. Try > this: The patch worked perfectly well for me, thank you! > Index: kthr.c > === > --- kthr.c(revision 224879) > +++ kthr.c(working copy) > @@ -73,11 +73,52 @@ kgdb_thr_first(void) > return (first); > } > > +static void > +kgdb_thr_add_procs(uintptr_t paddr) > +{ > + struct proc p; > + struct thread td; > + struct kthr *kt; > + CORE_ADDR addr; > + > + while (paddr != 0) { > + if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) { > + warnx("kvm_read: %s", kvm_geterr(kvm)); > + break; > + } > + addr = (uintptr_t)TAILQ_FIRST(&p.p_threads); > + while (addr != 0) { > + if (kvm_read(kvm, addr, &td, sizeof(td)) != > + sizeof(td)) { > + warnx("kvm_read: %s", kvm_geterr(kvm)); > + break; > + } > + kt = malloc(sizeof(*kt)); > + kt->next = first; > + kt->kaddr = addr; > + if (td.td_tid == dumptid) > + kt->pcb = dumppcb; > + else if (td.td_state == TDS_RUNNING && stoppcbs != 0 && > + CPU_ISSET(td.td_oncpu, &stopped_cpus)) > + kt->pcb = (uintptr_t)stoppcbs + > + sizeof(struct pcb) * td.td_oncpu; > + else > + kt->pcb = (uintptr_t)td.td_pcb; > + kt->kstack = td.td_kstack; > + kt->tid = td.td_tid; > + kt->pid = p.p_pid; > + kt->paddr = paddr; > + kt->cpu = td.td_oncpu; > + first = kt; > + addr = (uintptr_t)TAILQ_NEXT(&td, td_plist); > + } > + paddr = (uintptr_t)LIST_NEXT(&p, p_list); > + } > +} > + > struct kthr * > kgdb_thr_init(void) > { > - struct proc p; > - struct thread td; > long cpusetsize; > struct kthr *kt; > CORE_ADDR addr; > @@ -113,37 +154,11 @@ kgdb_thr_init(void) > > stoppcbs = kgdb_lookup("stoppcbs"); > > - while (paddr != 0) { > - if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) { > - warnx("kvm_read: %s", kvm_geterr(kvm)); > - break; > - } > - addr = (uintptr_t)TAILQ_FIRST(&p.p_threads); > - while (addr != 0) { > - if (kvm_read(kvm, addr, &td, sizeof(td)) != > - sizeof(td)) { > - warnx("kvm_read: %s", kvm_geterr(kvm)); > - break; > - } > - kt = malloc(sizeof(*kt)); > - kt->next = first; > - kt->kaddr = addr; > - if (td.td_tid == dumptid) > - kt->pcb = dumppcb; > - else if (td.td_state == TDS_RUNNING && stoppcbs != 0 && > - CPU_ISSET(td.td_oncpu, &stopped_cpus)) > - kt->pcb = (uintptr_t) stoppcbs + sizeof(struct > pcb) * td.td_oncpu; > - else > - kt->pcb = (uintptr_t)td.td_pcb; > - kt->kstack = td.td_kstack; > - kt->tid = td.td_tid; > - kt->pid = p.p_pid; > - kt->paddr = paddr; > - kt->cpu = td.td_oncpu; > - first = kt; > - addr = (uintptr_t)TAILQ_NEXT(&td, td_plist); > - } > - paddr = (uintptr_t)LIST_NEXT(&p, p_list); > + kgdb_thr_add_procs(paddr); > + addr = kgdb_lookup("zombproc"); > + if (addr != 0) { > + kvm_read(kvm, addr, &paddr, sizeof(paddr)); > + kgdb_thr_add_procs(paddr); > } > curkthr = kgdb_thr_lookup_tid(dumptid); > if (curkthr == NULL) > >> is there an easy way to examine its stack in this case? > > Hmm, you can use something like this from my kgdb macros. Oh, I completely forgot about them. I hope I will remember where to search for the tricks next time I need them :-) Thank you again! > For amd64: > > # Do a backtrace given %rip and %rbp as args > define bt > set $_rip = $arg0 > set $_rbp = $arg1 > set $i = 0 > while ($_rbp != 0 || $_rip != 0) > printf "%2d: pc ", $i > if ($_rip != 0) > x/1i $_rip > else > printf "\n" > end > if ($_rbp == 0) > set $_rip = 0 > else > set $fr = (struct amd64_frame *)$_rbp > set $_rbp = $fr->f_frame >
Re: USB/coredump hangs in 8 and 9
on 19/08/2011 00:24 Hans Petter Selasky said the following: > On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote: >> If you can help Hans to figure out what you is wrong with USB subsystem in >> this respect that would help us all. > > Hi, > > usb_busdma.c: /* we use "mtx_owned()" instead of this function */ > usb_busdma.c: owned = mtx_owned(uptag->mtx); > usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; > usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; > usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; > usb_hub.c: if (mtx_owned(&bus->bus_mtx)) { > usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) { > usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) { > usb_transfer.c: while (mtx_owned(&xroot->udev->bus->bus_mtx)) { > usb_transfer.c: while (mtx_owned(xroot->xfer_mtx)) { > > One fix you will need to do, if mtx_owned is not giving correct value is: First, could you please clarify what is the correct, or rather - expected, value in this case. It's not immediately clear to me if we should consider all locks as owned or un-owned in a situation where all locks are actually skipped behind the scenes. Maybe USB code should explicitly check for that condition as to not make any unsafe assumptions. Second, it's not clear to me what the above list actually represents in the context of this discussion. > static void > usbd_callback_wrapper(struct usb_xfer_queue *pq) > { > struct usb_xfer *xfer = pq->curr; > struct usb_xfer_root *info = xfer->xroot; > > USB_BUS_LOCK_ASSERT(info->bus, MA_OWNED); > if (!mtx_owned(info->xfer_mtx)) { > > The above "if" should be anded with && !paniced && !dumping ... or maybe the > new not scheduling variable is good for this purpose? > > /* > * Cases that end up here: > * > > #if USB_HAVE_BUSDMA > if (mtx_owned(xfer->xroot->xfer_mtx)) { > struct usb_xfer_queue *pq; > > > This case is more like a BUS-DMA error case, and is not so important to > execute. > > --HPS -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: USB/coredump hangs in 8 and 9
2011/8/12 Andrew Boyer : > Re: panic: bufwrite: buffer is not busy??? (originally on freebsd-net) > Re: debugging frequent kernel panics on 8.2-RELEASE (originally on > freebsd-stable) > Re: System hang in USB umass module while processing panic (originally on > freebsd-usb) > > Hello Andriy and Hans, > > Sorry for tying in so many discussions on this topic, but I think I have an > explanation for the problems we have been reporting* with hanging coredumps > on multicore systems on 8.2-RELEASE, and it has implications for Andriy's > proposed scheduler patch** and for USB. > > In today's 8.X and 9.X branches, nothing that I can find stops the other CPUs > when the kernel panics, but many parts of the locking code get disabled (grep > on 'panicstr'). The 'bufwrite: buffer is not busy???' panic is caused by the > syncer encountering an error. If that happens when it's on the dumping CPU > everything hangs. If it's running on a different CPU, it will be blocked and > hidden by the panic_cpu spinlock in panic(), and the dump continues, polling > every attached keyboard for a Ctl-C. > > But, the new 8.X USB stack relies on multithreading. (The new stack is the > variable that broke coredumps for us in the 7.1->8.2 transition, I think.) > SVN 224223 fixes a hang that would happen when dumpsys() polls the USB > keyboard (IPMI KVM, in our case). That helps, but it only gets as far as > usb_process(), where it hangs in a loop around a cv_wait() call. This is > easy to reproduce by adding code to the watchdog to break into the debugger > if panicstr is set. > > I am experimenting with Andriy's patch** to stop the scheduler and it seems > to be most of the way there, stopping the CPUs and disabling the rest of > locking. There are a few places that still reference panicstr, but that's > minor. These are the changes I made to the patch: > * Changed ukbd_do_poll() to return immediately if SCHEDULER_STOPPED() is > true, so that we don't hang up in USB. ukbd_yield() locks up in > DROP_GIANT(), and if you skip ukbd_yield(), usbd_transfer_poll() locks up > trying to drop mutexes. > * Changed the call to spinlock_enter() back to critical_enter(), so that > interrupts stay enabled and the hardclock still functions. Which spinlock_enter() are you referring here? I think that having interrupts fast handlers running during panic/shutdown is something we should avoid like hell. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
bad sector in gmirror HDD
System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 After a recent power failure, I'm seeing this in my logs: Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors And gmirror reports: # gmirror status NameStatus Components mirror/gm0 DEGRADED ad0 (100%) ad2 I think the solution is: gmirror rebuild Comments? Searching on that error message, I was led to believe that identifying the bad sector and running dd to read it would cause the HDD to reallocate that bad block. http://smartmontools.sourceforge.net/badblockhowto.html However, since ad2 is one half of a gmirror, I don't think this is the best approach. Comments? More information: smartd, gpart, dh, diskinfo, and fdisk output at http://beta.freebsddiary.org/smart-fixing-bad-sector.php also: # gmirror list Geom name: gm0 State: DEGRADED Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 3362720654 Providers: 1. Name: mirror/gm0 Mediasize: 40027028992 (37G) Sectorsize: 512 Mode: r6w5e14 Consumers: 1. Name: ad0 Mediasize: 40027029504 (37G) Sectorsize: 512 Mode: r1w1e1 State: SYNCHRONIZING Priority: 0 Flags: DIRTY, SYNCHRONIZING GenID: 0 SyncID: 1 Synchronized: 100% ID: 949692477 2. Name: ad2 Mediasize: 40027029504 (37G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY, BROKEN GenID: 0 SyncID: 1 ID: 3585934016 -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bad sector in gmirror HDD
On Aug 19, 2011, at 1:50 PM, Dan Langille wrote: > Searching on that error message, I was led to believe that identifying the > bad sector and > running dd to read it would cause the HDD to reallocate that bad block. > > http://smartmontools.sourceforge.net/badblockhowto.html > > However, since ad2 is one half of a gmirror, I don't think this is the best > approach. > > Comments? Reading the underlying failing drive with dd will help identify any other questionable sectors. However, your drive temps are too high-- many vendors call out either 50C or 55C as the point where drive reliability becomes significantly degraded. Regards, -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bad sector in gmirror HDD
On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: > System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 > > After a recent power failure, I'm seeing this in my logs: > > Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable > (pending) sectors I doubt this is related to a power failure. > Searching on that error message, I was led to believe that identifying the > bad sector and > running dd to read it would cause the HDD to reallocate that bad block. > > http://smartmontools.sourceforge.net/badblockhowto.html This is incorrect (meaning you've misunderstood what's written there). Unreadable LBAs can be a result of the LBA being actually bad (as in uncorrectable), or the LBA being marked "suspect". In either case the LBA will return an I/O error when read. If the LBAs are marked "suspect", the drive will perform re-analysis of the LBA (to determine if the LBA can be read and the data re-mapped, or if it cannot then the LBA is marked uncorrectable) when you **write** to the LBA. The above smartd output doesn't tell me much. Providing actual SMART attribute data (smartctl -a) for the drive would help. The brand of the drive, the firmware version, and the model all matter -- every drive behaves a little differently. Furthermore, if the LBA is re-analysed and determined to be uncorrectable -- regardless of remapping -- this doesn't actually fix I/O errors on a filesystem level. The filesystem itself (and more often than not in the data section of the file/inode, so things like fsck can't work around this) can still contain references to the LBA which is uncorrectable, and will still continue to return I/O errors when read. There has to be a way to tell the filesystem, when formatted, "avoid use of this LBA". How UFS/FFS handles this is unknown to me. I know of badsect(8) but I don't know if this works. "Transparent" remapping I have never seen work except on SSDs. If you want me to step you through the procedure of re-testing the LBAs (assuming they're suspect and not uncorrectable) I can do so, just ask. Finding the suspect LBAs can be done using a dd loop (I wrote a shell script for this), or using "smartctl -t select,0-max /dev/XXX" and let the drive's internal selective test see if it can find them. From there it's an issue of submitting a write request to the LBA and seeing what happens (I do this via dd as well, but the parameters you pass it are very specific, e.g. don't mix up/misunderstand seek vs. skip). I've assisted with this time and time again for folks on forums with varying success. I've also found some models of drives which claim there's suspect LBAs yet an internal surface scan passes with no issues (and these are drives which I myself have, the only difference between my drives and the individuals' drive is firmware, which leads me to believe a bug on some drives in the field). -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: panic: spin lock held too long (RELENG_8 from today)
If nobody complains about it earlier, I'll propose the patch to re@ in 8 hours. Attilio 2011/8/19 Mike Tancsa : > On 8/18/2011 8:37 PM, Chip Camden wrote: > >>> st> Thanks, Attilio. I've applied the patch and removed the extra debug >>> st> options I had added (though keeping debug symbols). I'll let you know >>> if >>> st> I experience any more panics. >>> >>> No panic for 20 hours at this moment, FYI. For my NFS server, I >>> think another 24 hours would be sufficient to confirm the stability. >>> I will see how it works... >>> >>> -- Hiroki >> >> Likewise: >> >> $ uptime >> 5:37PM up 21:45, 5 users, load averages: 0.68, 0.45, 0.63 >> >> So far, so good (knocks on head). >> > > > 0(ns4)% uptime > 8:55AM up 22:39, 3 users, load averages: 0.01, 0.00, 0.00 > 0(ns4)% > > > So far so good for me too > > ---Mike > > -- > --- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, m...@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bad sector in gmirror HDD
On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: > System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 > > After a recent power failure, I'm seeing this in my logs: > > Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable > (pending) sectors > Personally, I'd replace that drive now. > Searching on that error message, I was led to believe that identifying the > bad sector and > running dd to read it would cause the HDD to reallocate that bad block. No, as otherwise mentioned (Hi Jeremy!) you need to read and write the block. This could buy you a few more days or a few more weeks. Personally, I would not wait. Your call. > Comments? ... > Dan Langille - http://langille.org - Diane -- - d...@freebsd.org d...@db.net http://www.db.net/~db Why leave money to our children if we don't leave them the Earth? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bad sector in gmirror HDD
On Fri, Aug 19, 2011 at 4:57 PM, Diane Bruce wrote: > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 >> >> After a recent power failure, I'm seeing this in my logs: >> >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable >> (pending) sectors >> > > Personally, I'd replace that drive now. > >> Searching on that error message, I was led to believe that identifying the >> bad sector and >> running dd to read it would cause the HDD to reallocate that bad block. > > No, as otherwise mentioned (Hi Jeremy!) you need to read and write the > block. This could buy you a few more days or a few more weeks. Personally, > I would not wait. Your call. > While I largely agree, it depends on several factors as to whether I'd replace the drive. First, what does SMART show other then these errors? If the reported statistics look generally good, and considering that you a mirror with one "good" copy of the blocks in question, the impact is zero unless the other drive fails. That is why the blocks need to be re-written so that they will be re-located on the drive. Second, how critical is the data? The mirror gives good integrity, but you also need good backups. If the data MUST be on-line with high reliability, buy a replacement drive. You need to look at cost-benefit (or really the cost of replacement vs. cost of failure). It's worth mentioning that all drives have bad blocks. Most are hard bad blocks and are re-mapped before the drive is shipped, but marginal bad blocks can and do slip through to customers and it is entirely possible that the drive is just fine for the most part and replacing it is really a waste of money. Only you can make the call, but if further bad blocks show up in the near term, I'll go along with recommending replacement. -- R. Kevin Oberman, Network Engineer - Retired E-mail: kob6...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bad sector in gmirror HDD
On Fri, Aug 19, 2011 at 05:51:02PM -0700, Kevin Oberman wrote: > On Fri, Aug 19, 2011 at 4:57 PM, Diane Bruce wrote: > > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: > >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar ?3 04:52:04 GMT 2011 > >> > >> After a recent power failure, I'm seeing this in my logs: > >> > >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently > >> unreadable (pending) sectors > >> > > > > Personally, I'd replace that drive now. > > > >> Searching on that error message, I was led to believe that identifying the > >> bad sector and > >> running dd to read it would cause the HDD to reallocate that bad block. > > > > No, as otherwise mentioned (Hi Jeremy!) you need to read and write the > > block. This could buy you a few more days or a few more weeks. Personally, > > I would not wait. Your call. > > > > While I largely agree, it depends on several factors as to whether I'd > replace the drive. > > First, what does SMART show other then these errors? If the reported > statistics look generally good, and considering that you a mirror with > one "good" copy of the blocks in question, the impact is zero unless > the other drive fails. That is why the blocks need to be re-written so > that they will be re-located on the drive. > > Second, how critical is the data? The mirror gives good integrity, but > you also need good backups. If the data MUST be on-line with high > reliability, buy a replacement drive. You need to look at cost-benefit > (or really the cost of replacement vs. cost of failure). > > It's worth mentioning that all drives have bad blocks. Most are hard > bad blocks and are re-mapped before the drive is shipped, but marginal > bad blocks can and do slip through to customers and it is entirely > possible that the drive is just fine for the most part and replacing > it is really a waste of money. > > Only you can make the call, but if further bad blocks show up in the > near term, I'll go along with recommending replacement. I can expand a bit on this. With ATA/SATA and SCSI disks, there's a factory default list of LBAs which are bad (referred to as the "physical defect list"). Everyone by now is familiar with this. With SCSI disks there's "grown defects", which is a drive-managed AND user-managed list of LBAs which are considered bad. Whether these LBAs were correctable (remapped) or not is tracked by SMART on SCSI. I can provide many examples of this if people want to see what it looks like (we have quite a collection of Fujitsu disks at my workplace. They're one of a few vendors I more or less boycott). With SCSI, you can clear the grown defect list with ease. Some drives support clearing the physical defect list too, but doing that requires a *true* low-level format to be done afterward. In the case you issue a SCSI FORMAT command, any grown defects (as the drive encounters them) will be "merged" with the physical defect list. When the FORMAT is done, the drive will report 0 grown defects. Again, I can confirm this exact behaviour with our Fujitsu disks at my workplace; it's easy to get a list of the physical and grown defects with SCSI. With ATA/SATA disks it's a different story: It seems vary from vendor to vendor and model to model. The established theory is that the drive has a list of spare LBAs for remappings, which is managed entirely by the drive itself -- and not reported back to the user via SMART or any other means. This happens entirely without user intervention, and (on repetitive errors) might show up as the drive stalling on some I/O or other oddities. These situations are not reported back to the OS either -- it's entirely 100% transparent to the user. When an ATA/SATA disk begins reporting errors back via SMART, or to the OS (e.g. I/O error), on certain LBA accesses, then the theory is that the spare LBA list used by the drive internally has been exhausted, and it will begin using a different spare list (or an extension of the existing spares; I'm not sure). What Diane's getting at (Hi Diane!) is that since the drive is already to the stage/point of reporting errors back to the OS and SMART, it means the drive has experienced problems (which it worked around) prior to this point in time. Hence her recommendation to replace the drive. What I still have a bit of trouble stomaching these days is whether or not the above theories are still used *today* in practise on SATA disks. Part of me is inclined to believe that **any** errors are reported to SMART and the OS, and the remapping is reported via SMART, etc.; e.g. there's no more "transparent" anything. The problem is that I don't have a good way to confirm/deny this. Oh what I'd give for good engineering contacts within Western Digital and Seagate... These days, I replace drives depending upon their age (Power_On_Hours) combined with how many errors are seen and what kind of errors. For example, if I have a drive that's been in operation for
Re: bad sector in gmirror HDD
On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote: > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 >> >> After a recent power failure, I'm seeing this in my logs: >> >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable >> (pending) sectors > > I doubt this is related to a power failure. > >> Searching on that error message, I was led to believe that identifying the >> bad sector and >> running dd to read it would cause the HDD to reallocate that bad block. >> >> http://smartmontools.sourceforge.net/badblockhowto.html > > This is incorrect (meaning you've misunderstood what's written there). > > Unreadable LBAs can be a result of the LBA being actually bad (as in > uncorrectable), or the LBA being marked "suspect". In either case the > LBA will return an I/O error when read. > > If the LBAs are marked "suspect", the drive will perform re-analysis of > the LBA (to determine if the LBA can be read and the data re-mapped, or > if it cannot then the LBA is marked uncorrectable) when you **write** to > the LBA. > > The above smartd output doesn't tell me much. Providing actual SMART > attribute data (smartctl -a) for the drive would help. The brand of the > drive, the firmware version, and the model all matter -- every drive > behaves a little differently. Information such as this? http://beta.freebsddiary.org/smart-fixing-bad-sector.php -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: panic: spin lock held too long (RELENG_8 from today)
Attilio Rao wrote in : at> If nobody complains about it earlier, I'll propose the patch to re@ in 8 hours. Running fine for 45 hours so far. Please go ahead! -- Hiroki pgp3JVRs7kKa0.pgp Description: PGP signature
Re: bad sector in gmirror HDD
On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: > > On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote: > > > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: > >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 > >> > >> After a recent power failure, I'm seeing this in my logs: > >> > >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently > >> unreadable (pending) sectors > > > > I doubt this is related to a power failure. > > > >> Searching on that error message, I was led to believe that identifying the > >> bad sector and > >> running dd to read it would cause the HDD to reallocate that bad block. > >> > >> http://smartmontools.sourceforge.net/badblockhowto.html > > > > This is incorrect (meaning you've misunderstood what's written there). > > > > Unreadable LBAs can be a result of the LBA being actually bad (as in > > uncorrectable), or the LBA being marked "suspect". In either case the > > LBA will return an I/O error when read. > > > > If the LBAs are marked "suspect", the drive will perform re-analysis of > > the LBA (to determine if the LBA can be read and the data re-mapped, or > > if it cannot then the LBA is marked uncorrectable) when you **write** to > > the LBA. > > > > The above smartd output doesn't tell me much. Providing actual SMART > > attribute data (smartctl -a) for the drive would help. The brand of the > > drive, the firmware version, and the model all matter -- every drive > > behaves a little differently. > > Information such as this? > http://beta.freebsddiary.org/smart-fixing-bad-sector.php Yes, perfect. Thank you. First thing first: upgrade smartmontools to 5.41. Your attributes will be the same after you do this (the drive is already in smartmontools' internal drive DB), but I often have to remind people that they really need to keep smartmontools updated as often as possible. The changes between versions are vast; this is especially important for people with SSDs (I'm responsible for submitting some recent improvements for Intel 320 and 510 SSDs). Anyway, the drive (albeit an old PATA Maxtor) appears to have three anomalies: 1) One confirmed reallocated LBA (SMART attribute 5) 2) One "suspect" LBA (SMART attribute 197) 3) A very high temperature of 51C (SMART attribute 194). If this drive is in an enclosure or in a system with no fans this would be understandable, otherwise this is a bit high. My home workstation which has only one case fan has a drive with more platters than your Maxtor, and it idles at ~38C. Possibly this drive has been undergoing constant I/O recently (which does greatly increase drive temperature)? Not sure. I'm not going to focus too much on this one. The SMART error log also indicates an LBA failure at the 26000 hour mark (which is 16 hours prior to when you did smartctl -a /dev/ad2). Whether that LBA is the remapped one or the suspect one is unknown. The LBA was 5566440. The SMART tests you did didn't really amount to anything; no surprise. short and long tests usually do not test the surface of the disk. There are some drives which do it on a long test, but as I said before, everything varies from drive to drive. Furthermore, on this model of drive, you cannot do a surface scans via SMART. Bummer. That's indicated in the "Offline data collection capabilities" section at the top, where it reads: No Selective Self-test supported. So you'll have to use the dd method. This takes longer than if surface scanning was supported by the drive, but is acceptable. I'll get to how to go about that in a moment. The reallocated LBA cannot be dealt with aside from re-creating the filesystem and telling it not to use the LBA. I see no flags in newfs(8) that indicate a way to specify LBAs to avoid. And we don't know what LBA it is so we can't refer to it right now anyway. As I said previously, I have no idea how UFS/FFS deals with this. Using fsck(8) is not sufficient; fsck does not attempt reading every LBA on the disk or every LBA that makes up the data portions of an inode. It only examines the "structure" of the filesystem. Is it possible the remapped LBA lived within a structure region and not data? Yes. Is it likely? Given the size of the disk, probably not. As mentioned previously too, there's badsect(8) but I don't know if it works correctly on present-day FreeBSD, if it works with larger drives, on 64-bit, etc... You get the idea. Plus as I said I don't know what LBA to tell it to avoid. You also need to keep something in mind: the terms "sector" and "LBA" are in some ways interchangeable and in other ways aren't. I use the term LBA because nobody in their right mind uses CHS addressing any more. badsect(8) claims it wants sectors, which I want to assume are LBAs. I hope someone familiar with UFS/FFS can explain how to go about this process for UFS/FFS. As for ZFS (because I know someone will ask) -- AFAIK there is no mechanism
Re: bad sector in gmirror HDD
On Fri, 19 Aug 2011, Chuck Swiger wrote: Reading the underlying failing drive with dd will help identify any other questionable sectors. However, your drive temps are too high-- many vendors call out either 50C or 55C as the point where drive reliability becomes significantly degraded. The high temperature could be due to impending drive failure. I've seen that exact situation with a failing WD notebook drive. Lots of read failures, and it got very hot. The same model replacement drive ran normally, just warm. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bad sector in gmirror HDD
On Aug 20, 2011, at 06:24 , Jeremy Chadwick wrote: > You might also be wondering "that dd command writes 512 bytes of zero to > that LBA; what about the old data that was there, in the case that the > drive remaps the LBA?" If you write zeros at OS level to an LBA, you will end up with zeros at that LBA. What else did you expect??? The already remapped LBAs in ATA are not visible anymore to the user/OS. You get a perfectly readable sector. Of course not at the original location, but as you confirmed we are done with CHS addressing. The pending bad sectors are almost always 'corrected', that is, remapped when you write to that LBA. So your script will find only one readable sector and that will be the sector that is pending reallocation. It may be that writing zeros to all free space, like dd if=/dev/zero of=/filesystem/zero bs=1m; rm /filesystem/zero is enough to remap the pending bad block and not have any unreadable sectors. But if the unreadable sector is in a file or directory -- bad luck -- these will need to be rewritten. Once upon a time, BSD/OS had wonderful disk 'repair' utility. It could detect failing disks by reading every sector (had nice visual), or could re-write the drive by reading and writing back every sector. On bad blocks it would retry lots of times and eventually average what was read (with error). Having said that, I doubt modern ATA drives will let anything be read by the pending bad block, but.. who knows. Daniel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"