Hi all, 

I have a RAID-Z zpool made up of 4 x SATA drives running on Nexenta 1.0.1 
(OpenSolaris b85 kernel). It has on it some ZFS filesystems and few volumes 
that are shared to various windows boxes over iSCSI. On one particular iSCSI 
volume, I discovered that I had mistakenly deleted some files from the FAT32 
partition that is on it. The files were still in a ZFS snapshot that was made 
earlier in the morning so I made use of the ZFS clone command to create a 
separate copy of the volume. I accessed it in Windows, got the files I needed, 
and then proceeded to delete it using "zfs destroy". During the process, disk 
activity stopped, my SSH windows stopped responding and Windows lost all iSCSI 
connections, reporting delayed write failed for the volumes that disappeared. 

I powered down the Nexenta box and started it back up, where it hung with the 
following output: 

SunOS Release 5.11 Version NexentaOS_20080312 64-bit 
Loading Nexenta... 
Hostname: mammoth 

This is before the usual "Reading ZFS config: done" and "Mounting ZFS 
filesystems" indicators. The only way I could bring system up was to disconnect 
all four SATA drives before power-on. I can then export the zpool, reboot, and 
the system comes up without complaint. However, of course, the pool isn't 
imported. When I execute "zpool import", the pool is detected fine: 

pool: zp 
id: 2070286287887108251 
state: ONLINE 
action: The pool can be imported using its name or numeric identifier. 
config: 

zp ONLINE 
raidz1 ONLINE 
c0t1d0 ONLINE 
c0t0d0 ONLINE 
c0t3d0 ONLINE 
c0t2d0 ONLINE 

The next issue is that when the pool is actually imported ("zpool import -f 
zp"), it too hangs the whole system, albeit after a minute or so of disk 
activity. A "zpool iostat zp 10" during that time is below: 

capacity operations bandwidth 
pool used avail read write read write 
---------- ----- ----- ----- ----- ----- ----- 
zp 1.73T 1018G 1.13K 7 4.43M 23.6K 
zp 1.73T 1018G 1.05K 0 4.07M 0 
zp 1.73T 1018G 1.15K 0 4.88M 0 
zp 1.73T 1018G 457 0 1.36M 0 
zp 1.73T 1018G 668 0 2.49M 0 
zp 1.73T 1018G 411 0 1.80M 0 
[system stopped at this point and wouldn't accept keypresses any more] 

I'm lost as to what to do - every time the pool is imported, it briefly turns 
up in "zpool status", but will then hang the system to the extent that I must 
power off, disconnect drives, power up, zpool export, and reboot, just to be 
able to start typing commands again!! 

So far I've tried: 
1. Rebooting with only one of the SATA drives attached at a time. All four 
times the OS came up fine, but of course "zpool status" reported the pool as 
having insufficient replicas. I don't know whether powering up with two or 
three drives will work; I didn't want to try any permutations in case I made 
things worse. 

2. Checking with "fmdump -e", the only output relating to zfs is regarding 
missing vdev's and is presumably from when I have been rebooting with drives 
disconnected. 

3. "dd if=/dev/rdsk/c0t0d0 of=/"dev/null bs=1048576" and the equivalents for 
the other three drives are all currently running and I await the results. Given 
that a scrub takes about 7 hours, I expect I'll have to leave this overnight. 

4. "zdb -e zp" is now at the stage of "Traversing all blocks to verify 
checksums and verify nothing leaked ...". I expect this will also take some 
time. 

While I wait for the results from "dd" and "zdb", is there anything else I can 
try in order to get the pool up and running again? 

I have spotted some previous, similar posts regarding hanging, notably this 
one: 
http://opensolaris.org/jive/thread.jspa?threadID=70205&tstart=15 
Unfortunately, I am a bit of a Nexenta/OpenSolaris/Unix newbie so a lot of that 
is way over my head, and when the system completely hangs, I have no choice but 
to power off. Any help is much appreciated! 

Thanks,
Chris
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to