Eric Schrock wrote: > On Tue, Dec 12, 2006 at 02:08:57PM -0500, James F. Hranicky wrote: >> Sure, but that's what I want to avoid. The FMA agent should do this by >> itself, but it's not, so I guess I'm just wondering why, or if there's >> a good way to get to do so. If this happens in the middle of the night I >> don't want to have to run the commands by hand. > > Yes, the FMA agent should do this. Can you run 'fmdump -v' and see if > the DE correctly identified the faulted devices?
Here you go: # fmdump -v TIME UUID SUNW-MSG-ID Nov 29 16:29:12.1947 e50198f2-2eb9-c58b-d7c5-87aaae5cb935 ZFS-8000-D3 100% fault.fs.zfs.device Problem in: zfs://pool=8e63f0b8e4263e71/vdev=9272c0973ecdb27c Affects: zfs://pool=8e63f0b8e4263e71/vdev=9272c0973ecdb27c FRU: - Nov 30 10:31:48.8844 1a44a780-05c0-cb6e-d44f-f1d8999f40e5 ZFS-8000-D3 100% fault.fs.zfs.device Problem in: zfs://pool=51f1caf6cad1aa2f/vdev=769276842b0efd54 Affects: zfs://pool=51f1caf6cad1aa2f/vdev=769276842b0efd54 FRU: - Dec 11 14:04:57.8803 c46d21e0-200d-43a1-e5db-ae9c9ebf3482 ZFS-8000-D3 100% fault.fs.zfs.device Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=52070de44ec80c15 Affects: zfs://pool=2646e20c1cb0a9d0/vdev=52070de44ec80c15 FRU: - Dec 11 14:42:32.1271 1319464e-7a8c-e65b-962e-db386e90f7f2 ZFS-8000-D3 100% fault.fs.zfs.device Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745 Affects: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745 FRU: - I'm not really sure what it means. >> For instance, the zpool command hanging or the system hanging trying to >> reboot normally. > > If the SCSI commands hang forever, then there is nothing that ZFS can > do, as a single write will never return. The more likely case is that > the commands are continually timining out with very long response times, > and ZFS will continue to talk to them forever. The future FMA > integration I mentioned will solve this problem. In the meantime, you > should be able to 'zpool offline' the affected devices by hand. Well, as long as I know which device is affected :-> If "zpool status" doesn't return it may be difficult to figure out. Do you know if the SATA controllers in a Thumper can better handle this problem? > There is also associated work going on to better handle asynchrounous > reponse times across devices. Currently, a single slow device will slow > the entire pool to a crawl. Do you have an idea as to when this might be available? Thanks for all your input, Jim _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss