Cindy Swearingen wrote:
Hi Joe,
I have no clue why this drive was removed, particularly for a one time
failure. I would reconnect/reseat this disk and see if the system
recognizes it. If it resilvers, then you're back in business, but I
would use zpool status and fmdump to monitor this pool and its devices
more often.
A current Solaris system also has the ability to retire a device that
is faulty. You can check this process with fmadm faulty. But I don't
think a one time device failure (May 31), would remove this disk from
service. I'm no device removal expert so maybe someone else will
comment.
Thanks again for all of your help Cindy and others!
I removed the drive and reinserted it, no change... So, I exported it
and imported it, and sure enough it was recognized and started to
resilver immediately. If this happens next time I'll know what to do!
Still no clue why this happened, there were no error messages, and
aside from having to add the -f flag with the export the whole task was
quite uneventful.
Thanks,
Cindy
On 06/08/10 23:56, Joe Auty wrote:
Cindy Swearingen wrote:
According to this report, I/O to this
device caused a probe failure
because the device isn't available on May 31.
I was curious if this device had any previous issues over a longer
period of time.
Failing or faulted drives can also kill your pool's performance.
Any idea what happened here? Some weird one time fluky thing? Something
I ought to be concerned with?
Thanks,
Cindy
On 06/08/10 11:39, Joe Auty wrote:
Cindy Swearingen wrote:
Joe,
Yes, the device should resilver when its back online.
You can use the fmdump -eV command to discover when this device was
removed and other hardware-related events to help determine when this
device was removed.
I would recommend exporting (not importing) the pool before physically
changing the hardware. After the device is back online and the pool is
imported, you might need to use zpool clear to clear the pool status.
Here is the output of that command, does this reveal anything useful?
c0t7d0 is the drive that is marked as removed... I'll look into the
import and export functions to learn more about them. Thanks!
# fmdump -eV
TIME CLASS
May 31 2010 05:33:36.363381880 ereport.fs.zfs.probe_failure
nvlist version: 0
class = ereport.fs.zfs.probe_failure
ena = 0x5d2206865ac00401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x28ebd14a56dfe4df
vdev = 0xdbdc49ecb5479c40
(end detector)
pool = nm
pool_guid = 0x28ebd14a56dfe4df
pool_context = 0
pool_failmode = wait
vdev_guid = 0xdbdc49ecb5479c40
vdev_type = disk
vdev_path = /dev/dsk/c0t7d0s0
vdev_devid = id1,s...@n5000c5001e7cf7a7/a
parent_guid = 0x16cbb2c1f07c5f51
parent_type = raidz
prev_state = 0x0
__ttl = 0x1
__tod = 0x4c038270 0x15a8c478
Thanks,
Cindy
On 06/08/10 11:11, Joe Auty wrote:
Cindy Swearingen wrote:
Hi Joe,
The REMOVED status generally means that a device was physically removed
from the system.
If necessary, physically reconnect c0t7d0 or if connected, check
cabling, power, and so on.
If the device is physically connected, see what cfgadm says about this
device. For example, a device that was unconfigured from the system
would look like this:
# cfgadm -al | grep c4t2d0
c4::dsk/c4t2d0 disk connected unconfigured
unknown
(Finding the right cfgadm format for your h/w is another challenge.)
I'm very cautious about other people's data so consider this issue:
If possible, you might import the pool while you are physically
inspecting the device or changing it physically. Depending on your
hardware, I've heard of device paths changing if another device is
reseated or changes.
Thanks Cindy!
Here is what cfgadm is showing me:
# cfgadm -al | grep c0t7d0
c0::dsk/c0t7d0 disk connected configured
unknown
I'll definitely start with a reseating of the drive. I'm assuming that
once Solaris thinks the drive is no longer removed it will start
leveling on its own?
Thanks,
Cindy
On 06/07/10 17:50, besson3c wrote:
Hello,
I have a drive that was a part of the pool showing up as "removed". I
made no changes to the machine, and there are no errors being
displayed, which is rather weird:
# zpool status nm
pool: nm
state: DEGRADED
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
nm DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
c0t2d0 ONLINE 0 0 0
c0t3d0 ONLINE 0 0 0
c0t4d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0
c0t6d0 ONLINE 0 0 0
c0t7d0 REMOVED 0 0 0
What would your advice be here? What do you think happened, and what is
the smartest way to bring this disk back up? Since there are no errors
I'm inclined to throw it back into the pool and see what happens rather
than trying to replace it straight away.
Thoughts?
--
Joe Auty, NetMusician
NetMusician helps musicians, bands and artists create beautiful,
professional, custom designed, career-essential websites that are easy
to maintain and to integrate with popular social networks.
www.netmusician.org <http://www.netmusician.org>
j...@netmusician.org <mailto:j...@netmusician.org>
--
Joe Auty, NetMusician
NetMusician helps musicians, bands and artists create beautiful,
professional, custom designed, career-essential websites that are easy
to maintain and to integrate with popular social networks.
www.netmusician.org <http://www.netmusician.org>
j...@netmusician.org <mailto:j...@netmusician.org>
--
Joe Auty, NetMusician
NetMusician helps musicians, bands and artists create beautiful,
professional, custom designed, career-essential websites that are easy
to maintain and to integrate with popular social networks.
www.netmusician.org <http://www.netmusician.org>
j...@netmusician.org <mailto:j...@netmusician.org>
--
Joe Auty, NetMusician
 NetMusician
helps musicians, bands and artists create beautiful,
professional, custom designed, career-essential websites that are easy
to maintain and to integrate with popular social networks.
www.netmusician.org
j...@netmusician.org
|