Hi, I have a 3-node(SunFire V890) VCS cluster running Solaris 10 u4 with LUNs from some Sun 6130,6140 and IBM 8100 arrays. It has been working well. But one of the nodes started to have troubles in running ZFS commands this Tue, 2/19. Any ZFS command, e.g., 'zpool import' can take hours to complete. Sometimes it took 4-5 minutes, and run it again, it can take 60 minutes. On the other 2 nodes that share the same set of LUNs are still normal so far - take some 5-10 seconds or less for the same commands. I haven't noticed any error messages from the arrays or SAN switches and other than the HBAs and switch ports, they are virtually identical. (other commands like cfgadm, format,... seems normal, so I suspect the culprit might be related to ZFS. I open a case with Sun, this route seems take forever for this kind of issue and I haven't got any answer yet.)
The host is not down or crashed. I rebooted it once today, not sure if it's fixed by reboot, 'zpool import' can still take minutes rather than seconds to complete). I still need to create some test LUNs and pools for more tests. It seems everything is still normal except the ZFS. Most zfs commands also cause cpu loads well up till completed, as seen in vmstast,mpstat, or top. This has been causing us troubles as our home grown VCS ZFS agent would consider the zpool is dead after some consecutive failures in probing the pool (zpool status takes forever to complete). Does anyone has same problem or know what might be the cause/fix? Thanks. Max Holm This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss