The combination of dev_loss_tmo off and reconnect_delay > 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable to the target was reinserted.Though, now that I've unpacked it -- I don't think it is OK for dev_loss_tmo to be off, but fast IO to be on? That drops another conditional.Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo < 0, and fast_io_fail_tmo >= 0. The other transports do not allow this scenario, and I'm asking if it makes sense for SRP to allow it. But now that you mention reconnect_delay, what is the meaning of that when it is negative? That's not in the documentation. And should it be considered in srp_tmo_valid() -- are there values of reconnect_delay that cause problems? I'm starting to get a bit concerned about this patch -- can you, Vu, and Sebastian comment on the testing you have done?
Hello Bart,After running cable pull test on two local IB links for several hrs, I/Os got stuck.
Further commands "multipath -ll" or "fdisk -l" got stuck and never return Here are the stack dump for srp-x kernel threads. I'll run with #DEBUG to get more debug info on scsi host & rport -vu
srp_threads.txt.tgz
Description: application/compressed