We're currently evaluating ZFS prior to (hopefully) rolling it out across our server room, and have managed to lock up a server after connecting to an iSCSI target, and then changing the IP address of the target.
Basically we have two test Solaris servers running, and I followed the instructions on the post below to share a zpool on Server1 using the iSCSI Target, and then import that into a new zpool on Server2. http://blogs.sun.com/chrisg/date/20070418. Everything appeared to work fine until I moved the servers to a new network (while powered on), which changed their IP addresses. The server running the iSCSI Target is still fine, it has it's IP address and from another machine I can see that the iSCSI target is still visible. However, Server2 was not as happy with the move. As far as I can tell, all ZFS commands locked up on it. I couldn't run "zfs list", "zpool list", "zpool status" or "zfs iostat". Every single one locked up and I couldn't even find a way to stop them. Now I've seen a few posts about ZFS commands locking up, but this is very concerning for something we're considering using in a production system. Anyway, with Server 2 well and truly locked up, I restarted it hoping that would clear the problem (figuring ZFS would simply mark the device as offline), but found that the server can't even boot. For the past hour it has simply spammed the following message to the screen: "NOTICE: iscsi connection(27) unable to connecct to target iqn.1986-03.com.sun:02:3d882af1-91cc-6d9e-9f19-edfa095fca6d" Now that I wasn't expecting. This volume isn't a boot volume, the server doesn't need either ZFS or iSCSI to boot, and I don't think I even saved any data on that drive. I have found a post reporting a similar message to the above, which was reporting a ten minute boot delay with a working iSCSI volume, however I can't find anything to say what happens if the iSCSI volume is no longer there: http://forum.java.sun.com/thread.jspa?threadID=5243777&messageID=10004717 So, I have quite a few questions: 1. Does anybody know how I can recover from this, or am I going to have to wipe my test server and start again? 2. How vulnerable are the ZFS admin tools to locking up like this? 3. How vulnerable is the iSCSI client to locking up like this during boot? 4. Is there any way we can disconnect the iSCSI share while ZFS is locked up like this? What could I have tried to regain control of my server before rebooting? 5. If I can get the server booted, is there any way to redirect an iSCSI volume so it's pointing at the new IP address? (I was expecting to simply do a "zpool replace" when ZFS reported the drive as missing). And finally, does anybody know why "zpool status" should lock up like this? I'm really not happy that the ZFS admin tools seem so fragile. At the very least I would have expected "zpool status" to be able to list the devices attached to the pools and report that they are timing out or erroring, and for me to be able to use the other ZFS tools to forcibly remove failed drives as needed. Anything less means I'm risking my whole system should ZFS find something it doesn't like. I admit I'm a solaris newbie, but surely something designed as a robust filesystem also needs robust management tools? This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss