On 9 apr 2010, at 12.04, Andreas Höschler wrote:

> Hi Ragnar,
> 
>>> I need to replace a disk in a zfs pool on a production server (X4240 
>>> running Solaris 10) today and won't have access to my documentation there. 
>>> That's why I would like to have a good plan on paper before driving to that 
>>> location. :-)
>>> 
>>> The current tank pool looks as follows:
>>> 
>>> pool: tank
>>> state: ONLINE
>>> scrub: none requested
>>> config:
>>> 
>>>       NAME         STATE     READ WRITE CKSUM
>>>       tank         ONLINE       0     0     0
>>>         mirror     ONLINE       0     0     0
>>>           c1t2d0   ONLINE       0     0     0
>>>           c1t3d0   ONLINE       0     0     0
>>>         mirror     ONLINE       0     0     0
>>>           c1t5d0   ONLINE       0     0     0
>>>           c1t4d0   ONLINE       0     0     0
>>>         mirror     ONLINE       0     0     0
>>>           c1t15d0  ONLINE       0     0     0
>>>           c1t7d0   ONLINE       0     0     0
>>>         mirror     ONLINE       0     0     0
>>>           c1t8d0   ONLINE       0     0     0
>>>           c1t9d0   ONLINE       0     0     0
>>>         mirror     ONLINE       0     0     0
>>>           c1t10d0  ONLINE       0     0     0
>>>           c1t11d0  ONLINE       0     0     0
>>>         mirror     ONLINE       0     0     0
>>>           c1t12d0  ONLINE       0     0     0
>>>           c1t13d0  ONLINE       0     0     0
>>> 
>>> errors: No known data errors
>>> 
>>> Note that disk c1t15d0 is being used and has taken ove rthe duty of c1t6d0. 
>>> c1t6d0 failed and was replaced with a new disk a couple of months ago. 
>>> However, the new disk does not show up in /dev/rdsk and /dev/dsk. I was 
>>> told that the disk has to initialized first with the SCSI BIOS. I am going 
>>> to do so today (reboot the server). Once the disks shows up in  /dev/rdsk I 
>>> am planning to do the following:
>> 
>> I don't think that the BIOS and rebooting part ever has to be true,
>> at least I don't hope so. You shouldn't have to reboot just because
>> you replace a hot plug disk.
> 
> Hard to believe! But that's the most recent state of affairs. Not even the 
> Sun technician made the disk to show up in /dev/dsks. They have replaced it 3 
> times assuming it to be defect! :-)
> 
> I tried to remotely reboot the server (with LOM) and go into the SCSI BIOS to 
> initialize the disk, but the BIOS requires a key combination to initialize 
> the disk that does not go through the remote connections (don't remember 
> which one). That's why I am planning to drive to the remote location and do 
> it manually with a server reboot and keyboard and screen attached like in the 
> very old days. :-(

Yes, this is one of the many reasons that you shouldn't ever
be forced to do anything in a non booted state (like in a BIOS
setup thing or the like). :-(

>> Depending on the hardware and the state
>> of your system, it might not be the problem at all, and rebooting may
>> not help. Are the device links for c1t6* gone in /dev/(r)dsk?
>> Then someone must have ran a "devfsadm -C" or something like that.
>> You could try "devfsadm -sv" to see if it wants to (re)create any
>> device links. If you think that it looks good, run it with "devfsadm -v".
>> 
>> If it is the HBA/raid controller acting up and not showing recently
>> inserted drives, you should be able to talk to it with a program
>> from within the OS. raidctl for some LSI HBAs, and arcconf for
>> some SUN/StorageTek HBAs.
> 
> I have /usr/sbin/raidctl on that machine and just studied the man page of 
> this tool. But I couldn't find hints of how to initialize a disk c1t16d0. It 
> just talks about setting up raid volumes!? :-(

If the HBA/raid controller really is the problem at all, it is
probably about that it wants you to tell it how it should
present the disk to the computer (as part of a raid, as a
jbod disk, etc etc). It could also be that it wants you
just to initialize the disk for it, or that it sees that it has
been used in another raid configuration before and wants you
to acknowledge that you want to reinitialize it.
Hopefully you can just the disk and slot it in a straight through,
auto replace, jbod-like mode.
But this might not even be the problem. 

What HBA/raid controller do you have?

(If you have a STK-RAID-INT or similar, chanses are that it
actually is the Adaptec/Intel thing, and you will have do get
the software for it here:
<http://www.intel.com/support/go/sunraid.htm>
You can just download it and use .../cmdline/arcconf directly,
no need to install anything.)

It may also be something with "cfgadm", which you may have to
use on some models (X4500 i believe) when you are replacing
disks. I don't have one of those machines, and I haven't
understood why you should have to use cfgadm on those systems
either.

/ragge

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to