Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-19 Thread John Baldwin
On Thursday, August 18, 2011 4:09:35 pm Andriy Gapon wrote:
> on 17/08/2011 23:21 Andriy Gapon said the following:
> > It seems like everything starts with some kind of a race between terminating
> > processes in a jail and termination of the jail itself.  This is where the
> > details are very thin so far.  What we see is that a process (http) is in
> > exit(2) syscall, in exit1() function actually, and past the place where 
> > P_WEXIT
> > flag is set and even past the place where p_limit is freed and reset to 
> > NULL.
> > At that place the thread calls prison_proc_free(), which calls 
> > prison_deref().
> > Then, we see that in prison_deref() the thread gets a page fault because of 
> > what
> > seems like a NULL pointer dereference.  That's just the start of the 
> > problem and
> > its root cause.
> >
> > Then, trap_pfault() gets invoked and, because addresses close to NULL look 
> > like
> > userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn 
> > goes
> > on to call vm_map_growstack.  First thing that vm_map_growstack does is a 
> > call
> > to lim_cur(), but because p_limit is already NULL, that call results in a 
> > NULL
> > pointer dereference and a page fault.  Goto the beginning of this paragraph.
> >
> > So we get this recursion of sorts, which only ends when a stack is 
> > exhausted and
> > a CPU generates a double-fault.
> 
> BTW, does anyone has an idea why the thread in question would "disappear" from
> the kgdb's point of view?
> 
> (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid
> $3 = 102057
> (kgdb) tid 102057
> invalid tid
> 
> info threads also doesn't list the thread.
> 
> Is it because the panic happened while the thread was somewhere in exit1()?

Yes, it is a bug in kgdb that it only walks allproc and not zombproc.  Try this:

Index: kthr.c
===
--- kthr.c  (revision 224879)
+++ kthr.c  (working copy)
@@ -73,11 +73,52 @@ kgdb_thr_first(void)
return (first);
 }
 
+static void
+kgdb_thr_add_procs(uintptr_t paddr)
+{
+   struct proc p;
+   struct thread td;
+   struct kthr *kt;
+   CORE_ADDR addr;
+
+   while (paddr != 0) {
+   if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) {
+   warnx("kvm_read: %s", kvm_geterr(kvm));
+   break;
+   }
+   addr = (uintptr_t)TAILQ_FIRST(&p.p_threads);
+   while (addr != 0) {
+   if (kvm_read(kvm, addr, &td, sizeof(td)) !=
+   sizeof(td)) {
+   warnx("kvm_read: %s", kvm_geterr(kvm));
+   break;
+   }
+   kt = malloc(sizeof(*kt));
+   kt->next = first;
+   kt->kaddr = addr;
+   if (td.td_tid == dumptid)
+   kt->pcb = dumppcb;
+   else if (td.td_state == TDS_RUNNING && stoppcbs != 0 &&
+   CPU_ISSET(td.td_oncpu, &stopped_cpus))
+   kt->pcb = (uintptr_t)stoppcbs +
+   sizeof(struct pcb) * td.td_oncpu;
+   else
+   kt->pcb = (uintptr_t)td.td_pcb;
+   kt->kstack = td.td_kstack;
+   kt->tid = td.td_tid;
+   kt->pid = p.p_pid;
+   kt->paddr = paddr;
+   kt->cpu = td.td_oncpu;
+   first = kt;
+   addr = (uintptr_t)TAILQ_NEXT(&td, td_plist);
+   }
+   paddr = (uintptr_t)LIST_NEXT(&p, p_list);
+   }
+}
+
 struct kthr *
 kgdb_thr_init(void)
 {
-   struct proc p;
-   struct thread td;
long cpusetsize;
struct kthr *kt;
CORE_ADDR addr;
@@ -113,37 +154,11 @@ kgdb_thr_init(void)
 
stoppcbs = kgdb_lookup("stoppcbs");
 
-   while (paddr != 0) {
-   if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) {
-   warnx("kvm_read: %s", kvm_geterr(kvm));
-   break;
-   }
-   addr = (uintptr_t)TAILQ_FIRST(&p.p_threads);
-   while (addr != 0) {
-   if (kvm_read(kvm, addr, &td, sizeof(td)) !=
-   sizeof(td)) {
-   warnx("kvm_read: %s", kvm_geterr(kvm));
-   break;
-   }
-   kt = malloc(sizeof(*kt));
-   kt->next = first;
-   kt->kaddr = addr;
-   if (td.td_tid == dumptid)
-   kt->pcb = dumppcb;
-   else if (td.td_state == TDS_RUNNING && stoppcbs != 0 &&
-   CPU_ISSET(td.td_oncpu, &stopped_cpus))
-   kt

Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-19 Thread Mike Tancsa
On 8/18/2011 8:37 PM, Chip Camden wrote:

>> st> Thanks, Attilio.  I've applied the patch and removed the extra debug
>> st> options I had added (though keeping debug symbols).  I'll let you know if
>> st> I experience any more panics.
>>
>>  No panic for 20 hours at this moment, FYI.  For my NFS server, I
>>  think another 24 hours would be sufficient to confirm the stability.
>>  I will see how it works...
>>
>> -- Hiroki
> 
> Likewise:
> 
> $ uptime
>  5:37PM  up 21:45, 5 users, load averages: 0.68, 0.45, 0.63
> 
> So far, so good (knocks on head).
> 


0(ns4)% uptime
 8:55AM  up 22:39, 3 users, load averages: 0.01, 0.00, 0.00
0(ns4)%


So far so good for me too

---Mike

-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-19 Thread Chip Camden
Quoth Mike Tancsa on Friday, 19 August 2011:
> On 8/18/2011 8:37 PM, Chip Camden wrote:
> 
> >> st> Thanks, Attilio.  I've applied the patch and removed the extra debug
> >> st> options I had added (though keeping debug symbols).  I'll let you know 
> >> if
> >> st> I experience any more panics.
> >>
> >>  No panic for 20 hours at this moment, FYI.  For my NFS server, I
> >>  think another 24 hours would be sufficient to confirm the stability.
> >>  I will see how it works...
> >>
> >> -- Hiroki
> > 
> > Likewise:
> > 
> > $ uptime
> >  5:37PM  up 21:45, 5 users, load averages: 0.68, 0.45, 0.63
> > 
> > So far, so good (knocks on head).
> > 
> 
> 
> 0(ns4)% uptime
>  8:55AM  up 22:39, 3 users, load averages: 0.01, 0.00, 0.00
> 0(ns4)%
> 
> 
> So far so good for me too
> 
>   ---Mike
> 
> -- 
> ---
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, m...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Still up and running here.

 8:02AM  up 1 day, 12:10, 4 users, load averages: 0.08, 0.26, 0.52

After the panics began, I never went more than 12 hours without one before
applying this patch.  I think you nailed it, Attilio.  Or at least, you
moved it.

-- 
.O. | Sterling (Chip) Camden  | http://camdensoftware.com
..O | sterl...@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91  | http://chipstips.com


pgp4szrgFEc1J.pgp
Description: PGP signature


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-19 Thread Andriy Gapon
on 19/08/2011 15:14 John Baldwin said the following:
> Yes, it is a bug in kgdb that it only walks allproc and not zombproc.  Try 
> this:

The patch worked perfectly well for me, thank you!

> Index: kthr.c
> ===
> --- kthr.c(revision 224879)
> +++ kthr.c(working copy)
> @@ -73,11 +73,52 @@ kgdb_thr_first(void)
>   return (first);
>  }
>  
> +static void
> +kgdb_thr_add_procs(uintptr_t paddr)
> +{
> + struct proc p;
> + struct thread td;
> + struct kthr *kt;
> + CORE_ADDR addr;
> +
> + while (paddr != 0) {
> + if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) {
> + warnx("kvm_read: %s", kvm_geterr(kvm));
> + break;
> + }
> + addr = (uintptr_t)TAILQ_FIRST(&p.p_threads);
> + while (addr != 0) {
> + if (kvm_read(kvm, addr, &td, sizeof(td)) !=
> + sizeof(td)) {
> + warnx("kvm_read: %s", kvm_geterr(kvm));
> + break;
> + }
> + kt = malloc(sizeof(*kt));
> + kt->next = first;
> + kt->kaddr = addr;
> + if (td.td_tid == dumptid)
> + kt->pcb = dumppcb;
> + else if (td.td_state == TDS_RUNNING && stoppcbs != 0 &&
> + CPU_ISSET(td.td_oncpu, &stopped_cpus))
> + kt->pcb = (uintptr_t)stoppcbs +
> + sizeof(struct pcb) * td.td_oncpu;
> + else
> + kt->pcb = (uintptr_t)td.td_pcb;
> + kt->kstack = td.td_kstack;
> + kt->tid = td.td_tid;
> + kt->pid = p.p_pid;
> + kt->paddr = paddr;
> + kt->cpu = td.td_oncpu;
> + first = kt;
> + addr = (uintptr_t)TAILQ_NEXT(&td, td_plist);
> + }
> + paddr = (uintptr_t)LIST_NEXT(&p, p_list);
> + }
> +}
> +
>  struct kthr *
>  kgdb_thr_init(void)
>  {
> - struct proc p;
> - struct thread td;
>   long cpusetsize;
>   struct kthr *kt;
>   CORE_ADDR addr;
> @@ -113,37 +154,11 @@ kgdb_thr_init(void)
>  
>   stoppcbs = kgdb_lookup("stoppcbs");
>  
> - while (paddr != 0) {
> - if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) {
> - warnx("kvm_read: %s", kvm_geterr(kvm));
> - break;
> - }
> - addr = (uintptr_t)TAILQ_FIRST(&p.p_threads);
> - while (addr != 0) {
> - if (kvm_read(kvm, addr, &td, sizeof(td)) !=
> - sizeof(td)) {
> - warnx("kvm_read: %s", kvm_geterr(kvm));
> - break;
> - }
> - kt = malloc(sizeof(*kt));
> - kt->next = first;
> - kt->kaddr = addr;
> - if (td.td_tid == dumptid)
> - kt->pcb = dumppcb;
> - else if (td.td_state == TDS_RUNNING && stoppcbs != 0 &&
> - CPU_ISSET(td.td_oncpu, &stopped_cpus))
> - kt->pcb = (uintptr_t) stoppcbs + sizeof(struct 
> pcb) * td.td_oncpu;
> - else
> - kt->pcb = (uintptr_t)td.td_pcb;
> - kt->kstack = td.td_kstack;
> - kt->tid = td.td_tid;
> - kt->pid = p.p_pid;
> - kt->paddr = paddr;
> - kt->cpu = td.td_oncpu;
> - first = kt;
> - addr = (uintptr_t)TAILQ_NEXT(&td, td_plist);
> - }
> - paddr = (uintptr_t)LIST_NEXT(&p, p_list);
> + kgdb_thr_add_procs(paddr);
> + addr = kgdb_lookup("zombproc");
> + if (addr != 0) {
> + kvm_read(kvm, addr, &paddr, sizeof(paddr));
> + kgdb_thr_add_procs(paddr);
>   }
>   curkthr = kgdb_thr_lookup_tid(dumptid);
>   if (curkthr == NULL)
> 
>> is there an easy way to examine its stack in this case?
> 
> Hmm, you can use something like this from my kgdb macros.

Oh, I completely forgot about them.
I hope I will remember where to search for the tricks next time I need them :-)
Thank you again!

> For amd64:
> 
> # Do a backtrace given %rip and %rbp as args
> define bt
> set $_rip = $arg0
> set $_rbp = $arg1
> set $i = 0
> while ($_rbp != 0 || $_rip != 0)
>   printf "%2d: pc ", $i
>   if ($_rip != 0)
>   x/1i $_rip
>   else
>   printf "\n"
>   end
>   if ($_rbp == 0)
>   set $_rip = 0
>   else
>   set $fr = (struct amd64_frame *)$_rbp
>   set $_rbp = $fr->f_frame
>

Re: USB/coredump hangs in 8 and 9

2011-08-19 Thread Andriy Gapon
on 19/08/2011 00:24 Hans Petter Selasky said the following:
> On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote:
>> If you can help Hans to figure out what you is wrong with USB subsystem in
>> this respect that would help us all.
> 
> Hi,
> 
> usb_busdma.c:   /* we use "mtx_owned()" instead of this function */
> usb_busdma.c:   owned = mtx_owned(uptag->mtx);
> usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1;
> usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1;
> usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1;
> usb_hub.c:  if (mtx_owned(&bus->bus_mtx)) {
> usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) {
> usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) {
> usb_transfer.c: while (mtx_owned(&xroot->udev->bus->bus_mtx)) {
> usb_transfer.c: while (mtx_owned(xroot->xfer_mtx)) {
> 
> One fix you will need to do, if mtx_owned is not giving correct value is:

First, could you please clarify what is the correct, or rather - expected, value
in this case.  It's not immediately clear to me if we should consider all locks 
as
owned or un-owned in a situation where all locks are actually skipped behind the
scenes.
Maybe USB code should explicitly check for that condition as to not make any
unsafe assumptions.

Second, it's not clear to me what the above list actually represents in the
context of this discussion.

> static void
> usbd_callback_wrapper(struct usb_xfer_queue *pq)
> {
> struct usb_xfer *xfer = pq->curr;
> struct usb_xfer_root *info = xfer->xroot;
> 
> USB_BUS_LOCK_ASSERT(info->bus, MA_OWNED);
> if (!mtx_owned(info->xfer_mtx)) {
> 
> The above "if" should be anded with && !paniced && !dumping ... or maybe the 
> new not scheduling variable is good for this purpose?
> 
> /*
>  * Cases that end up here:
>  *
> 
> #if USB_HAVE_BUSDMA
> if (mtx_owned(xfer->xroot->xfer_mtx)) {
> struct usb_xfer_queue *pq;
> 
> 
> This case is more like a BUS-DMA error case, and is not so important to 
> execute.
> 
> --HPS


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: USB/coredump hangs in 8 and 9

2011-08-19 Thread Attilio Rao
2011/8/12 Andrew Boyer :
> Re: panic: bufwrite: buffer is not busy??? (originally on freebsd-net)
> Re: debugging frequent kernel panics on 8.2-RELEASE (originally on 
> freebsd-stable)
> Re: System hang in USB umass module while processing panic  (originally on 
> freebsd-usb)
>
> Hello Andriy and Hans,
>
> Sorry for tying in so many discussions on this topic, but I think I have an 
> explanation for the problems we have been reporting* with hanging coredumps 
> on multicore systems on 8.2-RELEASE, and it has implications for Andriy's 
> proposed scheduler patch** and for USB.
>
> In today's 8.X and 9.X branches, nothing that I can find stops the other CPUs 
> when the kernel panics, but many parts of the locking code get disabled (grep 
> on 'panicstr').  The 'bufwrite: buffer is not busy???' panic is caused by the 
> syncer encountering an error.  If that happens when it's on the dumping CPU 
> everything hangs.  If it's running on a different CPU, it will be blocked and 
> hidden by the panic_cpu spinlock in panic(), and the dump continues, polling 
> every attached keyboard for a Ctl-C.
>
> But, the new 8.X USB stack relies on multithreading.  (The new stack is the 
> variable that broke coredumps for us in the 7.1->8.2 transition, I think.)  
> SVN 224223 fixes a hang that would happen when dumpsys() polls the USB 
> keyboard (IPMI KVM, in our case).  That helps, but it only gets as far as 
> usb_process(), where it hangs in a loop around a cv_wait() call.  This is 
> easy to reproduce by adding code to the watchdog to break into the debugger 
> if panicstr is set.
>
> I am experimenting with Andriy's patch** to stop the scheduler and it seems 
> to be most of the way there, stopping the CPUs and disabling the rest of 
> locking.  There are a few places that still reference panicstr, but that's 
> minor.  These are the changes I made to the patch:
>  * Changed ukbd_do_poll() to return immediately if SCHEDULER_STOPPED() is 
> true, so that we don't hang up in USB.  ukbd_yield()  locks up in 
> DROP_GIANT(), and if you skip ukbd_yield(), usbd_transfer_poll() locks up 
> trying to drop mutexes.
>  * Changed the call to spinlock_enter() back to critical_enter(), so that 
> interrupts stay enabled and the hardclock still functions.

Which spinlock_enter() are you referring here?
I think that having interrupts fast handlers running during
panic/shutdown is something we should avoid like hell.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


bad sector in gmirror HDD

2011-08-19 Thread Dan Langille
System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011

After a recent power failure, I'm seeing this in my logs:

Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable 
(pending) sectors

And gmirror reports:

# gmirror status
  NameStatus  Components
mirror/gm0  DEGRADED  ad0 (100%)
  ad2

I think the solution is: gmirror rebuild

Comments?



Searching on that error message, I was led to believe that identifying the bad 
sector and
running dd to read it would cause the HDD to reallocate that bad block.

  http://smartmontools.sourceforge.net/badblockhowto.html

However, since ad2 is one half of a gmirror, I don't think this is the best 
approach.

Comments?




More information:

smartd, gpart, dh, diskinfo, and fdisk output at 
http://beta.freebsddiary.org/smart-fixing-bad-sector.php

also:

# gmirror list
Geom name: gm0
State: DEGRADED
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 3362720654
Providers:
1. Name: mirror/gm0
   Mediasize: 40027028992 (37G)
   Sectorsize: 512
   Mode: r6w5e14
Consumers:
1. Name: ad0
   Mediasize: 40027029504 (37G)
   Sectorsize: 512
   Mode: r1w1e1
   State: SYNCHRONIZING
   Priority: 0
   Flags: DIRTY, SYNCHRONIZING
   GenID: 0
   SyncID: 1
   Synchronized: 100%
   ID: 949692477
2. Name: ad2
   Mediasize: 40027029504 (37G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY, BROKEN
   GenID: 0
   SyncID: 1
   ID: 3585934016



-- 
Dan Langille - http://langille.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bad sector in gmirror HDD

2011-08-19 Thread Chuck Swiger
On Aug 19, 2011, at 1:50 PM, Dan Langille wrote:
> Searching on that error message, I was led to believe that identifying the 
> bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.
> 
>  http://smartmontools.sourceforge.net/badblockhowto.html
> 
> However, since ad2 is one half of a gmirror, I don't think this is the best 
> approach.
> 
> Comments?

Reading the underlying failing drive with dd will help identify any other 
questionable sectors.  However, your drive temps are too high-- many vendors 
call out either 50C or 55C as the point where drive reliability becomes 
significantly degraded.

Regards,
-- 
-Chuck

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bad sector in gmirror HDD

2011-08-19 Thread Jeremy Chadwick
On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
> 
> After a recent power failure, I'm seeing this in my logs:
> 
> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable 
> (pending) sectors

I doubt this is related to a power failure.

> Searching on that error message, I was led to believe that identifying the 
> bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.
> 
>   http://smartmontools.sourceforge.net/badblockhowto.html

This is incorrect (meaning you've misunderstood what's written there).

Unreadable LBAs can be a result of the LBA being actually bad (as in
uncorrectable), or the LBA being marked "suspect".  In either case the
LBA will return an I/O error when read.

If the LBAs are marked "suspect", the drive will perform re-analysis of
the LBA (to determine if the LBA can be read and the data re-mapped, or
if it cannot then the LBA is marked uncorrectable) when you **write** to
the LBA.

The above smartd output doesn't tell me much.  Providing actual SMART
attribute data (smartctl -a) for the drive would help.  The brand of the
drive, the firmware version, and the model all matter -- every drive
behaves a little differently.

Furthermore, if the LBA is re-analysed and determined to be
uncorrectable -- regardless of remapping -- this doesn't actually fix
I/O errors on a filesystem level.  The filesystem itself (and more often
than not in the data section of the file/inode, so things like fsck
can't work around this) can still contain references to the LBA which is
uncorrectable, and will still continue to return I/O errors when read.
There has to be a way to tell the filesystem, when formatted, "avoid use
of this LBA".  How UFS/FFS handles this is unknown to me.  I know of
badsect(8) but I don't know if this works.  "Transparent" remapping I
have never seen work except on SSDs.

If you want me to step you through the procedure of re-testing the LBAs
(assuming they're suspect and not uncorrectable) I can do so, just ask.
Finding the suspect LBAs can be done using a dd loop (I wrote a shell
script for this), or using "smartctl -t select,0-max /dev/XXX" and let
the drive's internal selective test see if it can find them.  From there
it's an issue of submitting a write request to the LBA and seeing what
happens (I do this via dd as well, but the parameters you pass it are
very specific, e.g. don't mix up/misunderstand seek vs. skip).

I've assisted with this time and time again for folks on forums with
varying success.

I've also found some models of drives which claim there's suspect LBAs
yet an internal surface scan passes with no issues (and these are drives
which I myself have, the only difference between my drives and the
individuals' drive is firmware, which leads me to believe a bug on some
drives in the field).

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-19 Thread Attilio Rao
If nobody complains about it earlier, I'll propose the patch to re@ in 8 hours.

Attilio

2011/8/19 Mike Tancsa :
> On 8/18/2011 8:37 PM, Chip Camden wrote:
>
>>> st> Thanks, Attilio.  I've applied the patch and removed the extra debug
>>> st> options I had added (though keeping debug symbols).  I'll let you know 
>>> if
>>> st> I experience any more panics.
>>>
>>>  No panic for 20 hours at this moment, FYI.  For my NFS server, I
>>>  think another 24 hours would be sufficient to confirm the stability.
>>>  I will see how it works...
>>>
>>> -- Hiroki
>>
>> Likewise:
>>
>> $ uptime
>>  5:37PM  up 21:45, 5 users, load averages: 0.68, 0.45, 0.63
>>
>> So far, so good (knocks on head).
>>
>
>
> 0(ns4)% uptime
>  8:55AM  up 22:39, 3 users, load averages: 0.01, 0.00, 0.00
> 0(ns4)%
>
>
> So far so good for me too
>
>        ---Mike
>
> --
> ---
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, m...@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/
>



-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bad sector in gmirror HDD

2011-08-19 Thread Diane Bruce
On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
> 
> After a recent power failure, I'm seeing this in my logs:
> 
> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable 
> (pending) sectors
> 

Personally, I'd replace that drive now. 

> Searching on that error message, I was led to believe that identifying the 
> bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.

No, as otherwise mentioned (Hi Jeremy!) you need to read and write the
block. This could buy you a few more days or a few more weeks. Personally,
I would not wait. Your call.
 
> Comments?
...
> Dan Langille - http://langille.org

- Diane
-- 
- d...@freebsd.org d...@db.net http://www.db.net/~db
  Why leave money to our children if we don't leave them the Earth?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bad sector in gmirror HDD

2011-08-19 Thread Kevin Oberman
On Fri, Aug 19, 2011 at 4:57 PM, Diane Bruce  wrote:
> On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
>> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
>>
>> After a recent power failure, I'm seeing this in my logs:
>>
>> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable 
>> (pending) sectors
>>
>
> Personally, I'd replace that drive now.
>
>> Searching on that error message, I was led to believe that identifying the 
>> bad sector and
>> running dd to read it would cause the HDD to reallocate that bad block.
>
> No, as otherwise mentioned (Hi Jeremy!) you need to read and write the
> block. This could buy you a few more days or a few more weeks. Personally,
> I would not wait. Your call.
>

While I largely agree, it depends on several factors as to whether I'd
replace the drive.

First, what does SMART show other then these errors?  If the reported
statistics look generally good, and considering that you a mirror with
one "good" copy of the blocks in question, the impact is zero unless
the other drive fails. That is why the blocks need to be re-written so
that they will be re-located on the drive.

Second, how critical is the data? The mirror gives good integrity, but
you also need good backups. If the data MUST be on-line with high
reliability, buy a replacement drive. You need to look at cost-benefit
(or really the cost of replacement vs. cost of failure).

It's worth mentioning that all drives have bad blocks. Most are hard
bad blocks and are re-mapped before the drive is shipped, but marginal
bad blocks can and do slip through to customers and it is entirely
possible that the drive is just fine for the most part and replacing
it is really a waste of money.

Only you can make the call, but if further bad blocks show up in the
near term, I'll go along with recommending replacement.

-- 
R. Kevin Oberman, Network Engineer - Retired
E-mail: kob6...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bad sector in gmirror HDD

2011-08-19 Thread Jeremy Chadwick
On Fri, Aug 19, 2011 at 05:51:02PM -0700, Kevin Oberman wrote:
> On Fri, Aug 19, 2011 at 4:57 PM, Diane Bruce  wrote:
> > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar ?3 04:52:04 GMT 2011
> >>
> >> After a recent power failure, I'm seeing this in my logs:
> >>
> >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently 
> >> unreadable (pending) sectors
> >>
> >
> > Personally, I'd replace that drive now.
> >
> >> Searching on that error message, I was led to believe that identifying the 
> >> bad sector and
> >> running dd to read it would cause the HDD to reallocate that bad block.
> >
> > No, as otherwise mentioned (Hi Jeremy!) you need to read and write the
> > block. This could buy you a few more days or a few more weeks. Personally,
> > I would not wait. Your call.
> >
> 
> While I largely agree, it depends on several factors as to whether I'd
> replace the drive.
> 
> First, what does SMART show other then these errors?  If the reported
> statistics look generally good, and considering that you a mirror with
> one "good" copy of the blocks in question, the impact is zero unless
> the other drive fails. That is why the blocks need to be re-written so
> that they will be re-located on the drive.
> 
> Second, how critical is the data? The mirror gives good integrity, but
> you also need good backups. If the data MUST be on-line with high
> reliability, buy a replacement drive. You need to look at cost-benefit
> (or really the cost of replacement vs. cost of failure).
> 
> It's worth mentioning that all drives have bad blocks. Most are hard
> bad blocks and are re-mapped before the drive is shipped, but marginal
> bad blocks can and do slip through to customers and it is entirely
> possible that the drive is just fine for the most part and replacing
> it is really a waste of money.
>
> Only you can make the call, but if further bad blocks show up in the
> near term, I'll go along with recommending replacement.

I can expand a bit on this.

With ATA/SATA and SCSI disks, there's a factory default list of LBAs
which are bad (referred to as the "physical defect list").  Everyone by
now is familiar with this.

With SCSI disks there's "grown defects", which is a drive-managed AND
user-managed list of LBAs which are considered bad.  Whether these LBAs
were correctable (remapped) or not is tracked by SMART on SCSI.  I can
provide many examples of this if people want to see what it looks like
(we have quite a collection of Fujitsu disks at my workplace.  They're
one of a few vendors I more or less boycott).

With SCSI, you can clear the grown defect list with ease.  Some drives
support clearing the physical defect list too, but doing that requires a
*true* low-level format to be done afterward.  In the case you issue a
SCSI FORMAT command, any grown defects (as the drive encounters them)
will be "merged" with the physical defect list.  When the FORMAT is
done, the drive will report 0 grown defects.  Again, I can confirm this
exact behaviour with our Fujitsu disks at my workplace; it's easy to get
a list of the physical and grown defects with SCSI.

With ATA/SATA disks it's a different story:

It seems vary from vendor to vendor and model to model.  The established
theory is that the drive has a list of spare LBAs for remappings, which
is managed entirely by the drive itself -- and not reported back to the
user via SMART or any other means.  This happens entirely without user
intervention, and (on repetitive errors) might show up as the drive
stalling on some I/O or other oddities.  These situations are not
reported back to the OS either -- it's entirely 100% transparent to the
user.

When an ATA/SATA disk begins reporting errors back via SMART, or to the
OS (e.g. I/O error), on certain LBA accesses, then the theory is that
the spare LBA list used by the drive internally has been exhausted, and
it will begin using a different spare list (or an extension of the
existing spares; I'm not sure).

What Diane's getting at (Hi Diane!) is that since the drive is already
to the stage/point of reporting errors back to the OS and SMART, it
means the drive has experienced problems (which it worked around) prior
to this point in time.  Hence her recommendation to replace the drive.

What I still have a bit of trouble stomaching these days is whether or
not the above theories are still used *today* in practise on SATA disks.
Part of me is inclined to believe that **any** errors are reported to
SMART and the OS, and the remapping is reported via SMART, etc.; e.g.
there's no more "transparent" anything.  The problem is that I don't
have a good way to confirm/deny this.

Oh what I'd give for good engineering contacts within Western Digital
and Seagate...

These days, I replace drives depending upon their age (Power_On_Hours)
combined with how many errors are seen and what kind of errors.  For
example, if I have a drive that's been in operation for 

Re: bad sector in gmirror HDD

2011-08-19 Thread Dan Langille

On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote:

> On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
>> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
>> 
>> After a recent power failure, I'm seeing this in my logs:
>> 
>> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable 
>> (pending) sectors
> 
> I doubt this is related to a power failure.
> 
>> Searching on that error message, I was led to believe that identifying the 
>> bad sector and
>> running dd to read it would cause the HDD to reallocate that bad block.
>> 
>>  http://smartmontools.sourceforge.net/badblockhowto.html
> 
> This is incorrect (meaning you've misunderstood what's written there).
> 
> Unreadable LBAs can be a result of the LBA being actually bad (as in
> uncorrectable), or the LBA being marked "suspect".  In either case the
> LBA will return an I/O error when read.
> 
> If the LBAs are marked "suspect", the drive will perform re-analysis of
> the LBA (to determine if the LBA can be read and the data re-mapped, or
> if it cannot then the LBA is marked uncorrectable) when you **write** to
> the LBA.
> 
> The above smartd output doesn't tell me much.  Providing actual SMART
> attribute data (smartctl -a) for the drive would help.  The brand of the
> drive, the firmware version, and the model all matter -- every drive
> behaves a little differently.

Information such as this?  
http://beta.freebsddiary.org/smart-fixing-bad-sector.php


-- 
Dan Langille - http://langille.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-19 Thread Hiroki Sato
Attilio Rao  wrote
  in :

at> If nobody complains about it earlier, I'll propose the patch to re@ in 8 
hours.

 Running fine for 45 hours so far.  Please go ahead!

-- Hiroki


pgp3JVRs7kKa0.pgp
Description: PGP signature


Re: bad sector in gmirror HDD

2011-08-19 Thread Jeremy Chadwick
On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote:
> 
> On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote:
> 
> > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
> >> 
> >> After a recent power failure, I'm seeing this in my logs:
> >> 
> >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently 
> >> unreadable (pending) sectors
> > 
> > I doubt this is related to a power failure.
> > 
> >> Searching on that error message, I was led to believe that identifying the 
> >> bad sector and
> >> running dd to read it would cause the HDD to reallocate that bad block.
> >> 
> >>  http://smartmontools.sourceforge.net/badblockhowto.html
> > 
> > This is incorrect (meaning you've misunderstood what's written there).
> > 
> > Unreadable LBAs can be a result of the LBA being actually bad (as in
> > uncorrectable), or the LBA being marked "suspect".  In either case the
> > LBA will return an I/O error when read.
> > 
> > If the LBAs are marked "suspect", the drive will perform re-analysis of
> > the LBA (to determine if the LBA can be read and the data re-mapped, or
> > if it cannot then the LBA is marked uncorrectable) when you **write** to
> > the LBA.
> > 
> > The above smartd output doesn't tell me much.  Providing actual SMART
> > attribute data (smartctl -a) for the drive would help.  The brand of the
> > drive, the firmware version, and the model all matter -- every drive
> > behaves a little differently.
> 
> Information such as this?  
> http://beta.freebsddiary.org/smart-fixing-bad-sector.php

Yes, perfect.  Thank you.  First thing first: upgrade smartmontools to
5.41.  Your attributes will be the same after you do this (the drive is
already in smartmontools' internal drive DB), but I often have to remind
people that they really need to keep smartmontools updated as often as
possible.  The changes between versions are vast; this is especially
important for people with SSDs (I'm responsible for submitting some
recent improvements for Intel 320 and 510 SSDs).

Anyway, the drive (albeit an old PATA Maxtor) appears to have three
anomalies:

1) One confirmed reallocated LBA (SMART attribute 5)

2) One "suspect" LBA (SMART attribute 197)

3) A very high temperature of 51C (SMART attribute 194).  If this drive
is in an enclosure or in a system with no fans this would be
understandable, otherwise this is a bit high.  My home workstation which
has only one case fan has a drive with more platters than your Maxtor,
and it idles at ~38C.  Possibly this drive has been undergoing constant
I/O recently (which does greatly increase drive temperature)?  Not sure.
I'm not going to focus too much on this one.

The SMART error log also indicates an LBA failure at the 26000 hour mark
(which is 16 hours prior to when you did smartctl -a /dev/ad2).  Whether
that LBA is the remapped one or the suspect one is unknown.  The LBA was
5566440.

The SMART tests you did didn't really amount to anything; no surprise.
short and long tests usually do not test the surface of the disk.  There
are some drives which do it on a long test, but as I said before,
everything varies from drive to drive.

Furthermore, on this model of drive, you cannot do a surface scans via
SMART.  Bummer.  That's indicated in the "Offline data collection
capabilities" section at the top, where it reads:

No Selective Self-test supported.

So you'll have to use the dd method.  This takes longer than if surface
scanning was supported by the drive, but is acceptable.  I'll get to how
to go about that in a moment.

The reallocated LBA cannot be dealt with aside from re-creating the
filesystem and telling it not to use the LBA.  I see no flags in
newfs(8) that indicate a way to specify LBAs to avoid.  And we don't
know what LBA it is so we can't refer to it right now anyway.

As I said previously, I have no idea how UFS/FFS deals with this.  Using
fsck(8) is not sufficient; fsck does not attempt reading every LBA on
the disk or every LBA that makes up the data portions of an inode.  It
only examines the "structure" of the filesystem.  Is it possible the
remapped LBA lived within a structure region and not data?  Yes.  Is it
likely?  Given the size of the disk, probably not.

As mentioned previously too, there's badsect(8) but I don't know if it
works correctly on present-day FreeBSD, if it works with larger drives,
on 64-bit, etc...  You get the idea.  Plus as I said I don't know what
LBA to tell it to avoid.  You also need to keep something in mind: the
terms "sector" and "LBA" are in some ways interchangeable and in other
ways aren't.  I use the term LBA because nobody in their right mind uses
CHS addressing any more.  badsect(8) claims it wants sectors, which I
want to assume are LBAs.

I hope someone familiar with UFS/FFS can explain how to go about this
process for UFS/FFS.

As for ZFS (because I know someone will ask) -- AFAIK there is no
mechanism

Re: bad sector in gmirror HDD

2011-08-19 Thread Warren Block

On Fri, 19 Aug 2011, Chuck Swiger wrote:

Reading the underlying failing drive with dd will help identify any 
other questionable sectors.  However, your drive temps are too high-- 
many vendors call out either 50C or 55C as the point where drive 
reliability becomes significantly degraded.


The high temperature could be due to impending drive failure.  I've seen 
that exact situation with a failing WD notebook drive.  Lots of read 
failures, and it got very hot.  The same model replacement drive ran 
normally, just warm.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bad sector in gmirror HDD

2011-08-19 Thread Daniel Kalchev

On Aug 20, 2011, at 06:24 , Jeremy Chadwick wrote:

> You might also be wondering "that dd command writes 512 bytes of zero to
> that LBA; what about the old data that was there, in the case that the
> drive remaps the LBA?"

If you write zeros at OS level to an LBA, you will end up with zeros at that 
LBA. What else did you expect???

The already remapped LBAs in ATA are not visible anymore to the user/OS. You 
get a perfectly readable sector. Of course not at the original location, but as 
you confirmed we are done with CHS addressing.

The pending bad sectors are almost always 'corrected', that is, remapped when 
you write to that LBA.

So your script will find only one readable sector and that will be the sector 
that is pending reallocation.

It may be that writing zeros to all free space, like

dd if=/dev/zero of=/filesystem/zero bs=1m; rm /filesystem/zero

is enough to remap the pending bad block and not have any unreadable sectors. 
But if the unreadable sector is in a file or directory -- bad luck -- these 
will need to be rewritten.

Once upon a time, BSD/OS had wonderful disk 'repair' utility. It could detect 
failing disks by reading every sector (had nice visual), or could re-write the 
drive by reading and writing back every sector. On bad blocks it would retry 
lots of times and eventually average what was read (with error).
Having said that, I doubt modern ATA drives will let anything be read by the 
pending bad block, but.. who knows.

Daniel

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"