Well I do have a plan.
Thanks to the portability of ZFS boot disks, I'll make two new OS disks on
another machine with the next Nexcenta release, export the data pool and swap
in the new ones.
That way, I can at least manage a zfs scrub without killing the performance and
get the Intel SSD's I
Its hard to tell what caused the smart predictive failure message,
like a temp fluctuation. If ZFS noticed that a disk wasn't available
yet, then I would expect a message to that effect.
In any case, I think I would have a replacement disk available.
The important thing is that you continue to m
Hi Mark,
I would recheck with fmdump to see if you have any persistent errors
on the second disk.
The fmdump command will display faults and fmdump -eV will display
errors (persistent faults that have turned into errors based on some
criteria).
If fmdump -eV doesn't show any activity for that
Nothing like a "heart in mouth moment" to shave tears from your life.
I rebooted a snv_132 box in perfect heath, and it came back up with two FAULTED
disks in the same vdisk group.
Everything an hour on Google I found basically said "your data is gone".
All 45Tb of it.
A postmortem of fmadm sh
Ok, this is getting weird. I just ran a zpool clear, and now it says:
# zpool clear zfspool
# zpool status
pool: zfspool
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool u
Thanks Mark, it looks like that was good advice. It also appears that as
suggested, it's not the drive that's faulty... anybody have any thoughts as to
how I find what's actually the problem?
# zpool status
pool: zfspool
state: DEGRADED
status: One or more devices has experienced an unrecove
"scrub: resilver completed after 5h50m with 0 errors on Tue Jun 23 05:04:18
2009"
Zero errors even though other parts of the message definitely show errors?
This is described here: http://docs.sun.com/app/docs/doc/819-5461/gbcve?a=view
Device errors do not guarantee pool errors when redundancy
On Mon, 22 Jun 2009, Ross wrote:
All seemed well, I replaced the faulty drive, imported the pool again, and
kicked off the repair with:
# zpool replace zfspool c1t1d0
What build are you running? Between builds 105 and 113 inclusive there's
a bug in the resilver code which causes it to miss
On Tue, Jun 23, 2009 at 1:13 PM, Ross wrote:
> Look at how the resilver finished:
>
> c1t3d0 ONLINE 3 0 0 128K resilvered
> c1t4d0 ONLINE 0 0 11 473K resilvered
> c1t5d0 ONLINE 0 0 23 986K resilvered
Comparing from your
To be honest, never. It's a cheap server sat at home, and I never got around
to writing a script to scrub it and report errors.
I'm going to write one now though! Look at how the resilver finished:
# zpool status
pool: zfspool
state: ONLINE
status: One or more devices has experienced an unr
On Mon, 22 Jun 2009, Ed Spencer wrote:
I'm curious, how often do you scrub the pool?
Once a week for me. Early every Monday morning so that if something
goes wrong, it is at the start of the week.
Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfrie
I'm curious, how often do you scrub the pool?
On Mon, 2009-06-22 at 15:33, Ross wrote:
> Hey folks,
>
> Well, I've had a disk fail in my home server, so I've had my first experience
> of hunting down the faulty drive and replacing it (damn site easier on Sun
> kit than on a home built box I can
Lucky one there Ross!
Makes me glad I also upgraded to RAID-Z2 ;-)
Simon
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hey folks,
Well, I've had a disk fail in my home server, so I've had my first experience
of hunting down the faulty drive and replacing it (damn site easier on Sun kit
than on a home built box I can tell you!).
All seemed well, I replaced the faulty drive, imported the pool again, and
kicked o
14 matches
Mail list logo