Follow-up : happy end ...

It took quite some thinkering but... i have my data back...

I ended up starting without the troublesome zfs storage array, de-installed the 
iscsitartget software and re-installed it...just to have solaris boot without 
complaining about missing modules...

That left me with a system that would boot as long as the storage was 
disconnected... Reconnecting it made the boot stop at the hostname. Then the 
disk activity light would flash every second or so forever... I then rebooted 
using milestone=none. That worked also with the storage attached! So now I was 
sure that some software process was causing a hangup (or what appeared to be a 
hangup.) I could now in milestone none verify that the pool was intact: and so 
it was... fortunately I had not broken the pool itself... all online with no 
errors to report.
I then went to milestone-all which again made the system hang with the disk 
activity every second forever. I think the task doing this was devfsadm. I then 
"assumed on a gut feeling" that somehow the system was scanning or checking the 
pool. I left the system overnight in a desperate attempt because I calculated 
the 500GB checking to take about 8 hrs if the system would *really* scan 
everything. (I copied a 1 TB drive last week which took nearly 20 hrs, so I 
learned that sometimes I need to wait... copying these big disks takes a *lot* 
of time!)

This morning I switched on the monitor and lo and behold : a login screen !!!!
The store was there!

Lesson for myself and others: you MUST wait at the hostname line: the system 
WILL eventually come online... but don't ask how long it takes... I hate to 
think how long it would take if I had a 10TB system. (but then again, a 
file-system-check on an ext2 disk also takes forever...)

I re-enabled the iscsitgtd and did a list : it saw one of the two targets ! 
(which was ok because I remembered that I had turned off the shareiscsi flag on 
the second share.
I then went ahead and connected the system back into the network and "repaired" 
the iscsi-target on the virtual mainframe : WORKED ! Copied over the virtual 
disks to local store so I can at least start up these servers asap again.
Then set the iscsishare on the second and most important share: OK! Listed the 
targets: THERE, BOTH! Repaired it's connection too: WORKED...!

I am copying everything away from the ZFS pools now, but my data is 
recovered... fortunately.

I now have mixed feelings about the ordeal: yes Sun Solaris kept its promise: I 
did not loose my data. But the time and trouble it took to recover in this case 
(just to restart a system for example taking an overnight wait!) is something 
that a few of my customers would *seriously* dislike...

But: a happy end after all... most important data rescued and 2nd important : I 
learned a lot in the process...

Bye
Luc De Meyer
Belgium
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to