> And to panic? How can that in any sane way be good > way to "protect" the application? > *BANG* - no chance at all for the application to > handle the problem...
I agree -- a disk error should never be fatal to the system; at worst, the file system should appear to have been forcibly unmounted (and "worst" really means that critical metadata, like the superblock/uberblock, can't be updated on any of the disks in the pool). That at least gives other applications which aren't using the file system the chance to keep going. An I/O error detected when writing a file can be reported at write() time, fsync() time, or close() time. Any application which doesn't check all three of these won't handle all I/O errors properly; and applications which care about knowing that their data is on disk must either use synchronous writes (O_SYNC/O_DSYNC) or call fsync before closing the file. ZFS should report back these errors in all cases and avoid panicing (obviously). That said, it also appears that the device drivers (either the FibreChannel or SCSI disk drivers in this case) are misbehaving. The FC driver appears to be reporting back an error which is interpreted as fatal by the SCSI disk driver when one or the other should be retrying the I/O. (It also appears that either the FC driver, SCSI disk driver, or ZFS is misbehaving in the observed hang.) So ZFS should be more resilient against write errors, and the SCSI disk or FC drivers should be more resilient against LIPs (the most likely cause of your problem) or other transient errors. (Alternatively, the ifp driver should be updated to support the maximum number of targets on a loop, which might also solve your second problem.) This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss