Toby Thain wrote: > On 24-Nov-08, at 3:49 PM, Miles Nordin wrote: > > >>>>>>> "tt" == Toby Thain <[EMAIL PROTECTED]> writes: >>>>>>> >> tt> Why would it be assumed to be a bug in Solaris? Seems more >> tt> likely on balance to be a problem in the error reporting path >> tt> or a controller/ firmware weakness. >> >> It's not really an assumption. It's been discussed in here a lot, and >> we know why it's happening. It's just a case of ``it's a feature not >> a bug'' combined with ``somebody else's problem.'' >> >> The error-reporting path you mention is inside Solaris, so I have a >> little trouble decoding your statement. >> >> > > > Not all of it is! > > I don't see how anyone could confidently correlate "behaviour after > sledgehammer impact" with a specific fault in Solaris, without doing > a lot more investigation than "watching a YouTube video". Perhaps > this has already been narrowed down to a specific root cause within > Solaris - I just didn't see enough data in the OP's post to indicate > that. >
We could add strain sensors to disk drives which, when the strain was suddenly too great, would register an ASC/ASCQ 75/00 "DEVICE WAS HIT BY A HAMMER" and then we could add the e-report to sd and then register with a "io-hammer-event" FMA diagnosis engine which would be registered to ZFS to offline the device :-) But seriously, it really does depend on the failure mode of the device and I'm not sure people have studied the hammer case very closely. In the worst case, the device would be selectable, but not responding to data requests which would lead through the device retry logic and can take minutes. If the (USB) device simply disappeared, it would be indistinquishable from a hot-plug event and that logic would take over which results in a faster diagnosis. I suppose it will depend on the device and your aim. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss