Hi, list Background: As we know, when the SCSI disk is in failure state, for example bad sectors appear on disk or disk is surprise-removed, the SCSI middle-layer will endless retry the failed I/O requests if SCSI mid-layer can't get the notification from LLDD to stop the retry. Unfortunately now *not all LLDD* can notify the mid-layer to stop the retry when SCSI disk is in failure state.
Let me tell you the testing experience in our lab: 1 we install kernel 2.6.11.2 on a Tiger4 platform, 2 there're two SCSI disks, one is sda for root fs, the other is sdb for /mnt 3 execute "cp -r /usr/src/linux-2.6.11.2 /mnt" 4 during the process of copying, we surprise-removed sdb 5 then system become very busy and freezing, even the user can't login into the system whether locally or remotely 6 the error output on the screen demonstrates that SCSI mid-layer is endless retrying the failed I/O requests To overcome this morbid(or weird) behavior, I propose to add a new sysfs attribute to SCSI device. Attribute name: stop_retry_threshold Description: user set a threshold value through this interface, so that after SCSI mid-layer has retried "threshold" times, it'll automatically stop the further retries to make system calm down and usable to other users. Usage example: user execute "echo 100 > /sys/block/sdb/device/stop_retry_threshold" to tell SCSI mid-layer to automatically stop the further retries after it has retries 100 times. What's your comment about this proposal? If there's no objection, I'll send out the patch soon. Thanks, Forrest - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html