Re: NFS hard semantics wanted: how to?

Mike Christie Sat, 01 Jan 2011 21:09:46 -0800

On 12/30/2010 10:09 PM, torn5 wrote:

On 12/22/2010 07:09 PM, Mike Christie wrote:

On 12/22/2010 05:57 AM, torn5 wrote:

Hello open-iscsi people
I am approaching iscsi, and I am currently doing some "reliability"
tests.


In particular I would like to be able to reboot the target machine
without the initiators to lose data.
Like NFS hard mounts.

[CUT]

These are the errors I see:
[31291.360009] EXT4-fs (sdd1): error count: 10
[31291.360013] EXT4-fs (sdd1): initial error at 1292972264:
ext4_remount:3755
[31291.360015] EXT4-fs (sdd1): last error at 1292976117:
ext4_put_super:719
They look harmful...

Could you send the rest of your /var/log/messages? It should have some
scsi error code info and block layer error info.

Could you also turn on iscsi eh debugging


Hello Mike
sorry for the delay in the reply, I was doing some zillions of tests...

The error I reported was erroneous, it was all right.
It was an ext4 new misleading feature: after 300 seconds from mount it
reports last errors seen on that filesystem (coming from my earlier
tests) and you can clear that error log only by paying money to Ted Ts'o
just kidding

:)

the log would have been cleared if I had a newer version of fsck.ext4 .
I was seeing those errors spitted out during my disconnection tests and
I thought they were due to the disconnections but they were just an old
log.

The replacement timeout thing works flawlessly, my congrats on this
excellent piece of software and for all the information.

Just a few more questions:

1- Can I raise the number of "5" resubmissions from SCSI, possibly by
recompiling the kernel? Do you know & could you tell me where that
number is? I grepped the sources but there are too many values and I'm
not sure what is the right one.

It is controlled by the scsi layer and it is hardcoded indrivers/scsi/sd.h's SD_MAX_RETRIES definition.

2- Wouldn't it be better to have a separate error count for network
errors? I would raise that one. Why should a network error eat retries
from scsi errors? Is it scsi standard that mandates equality of network
failures and disk failures? Seems strange/unwise to me...

It is just a generic counter that says if the command does not completein 5 retries then fail it. I don't think the value of 5 is based onanything that is specific to disks or network behavior (FC drivers forexample do something similar for SAN problems). I think it is just basedon past experience that if the IO is retryable but it does not completein 5 retries for any reason it is not going to complete.

It seems rare that you a command would get 3 network errors and 2 diskerrors, so I do not think it has come up before.

When the iscsi driver was getting submitted a long time ago, there wascode to make the retries configurable, but it got rejected. And, I thinkon the linux-scsi list every so often someone sends a patch to make theretries configurable from sysfs, but it does not get picked up. If I canfind the posts I will send them. I think there is more info in them.


3- this is a kinda bug report / feature request:
I wanted to raise replacement_tmo (via sysfs) to a very high value but
it wrapped around. The limit seems to be 2**31/HZ, after that it wraps,
it doesn't tell you anything immediately but at the first network
disconnection it expires immediately like if it was below zero.
Hence, if I have HZ=1000 the max is about 24 days.
It might sound crazy but I would like higher values. The thing is, we
have (virtual) machines with almost-abandoned services, and if those
freeze for 24 days we might not notice it and then we can start having
errors and potentially a filesystem corruption. I would like possibly an
infinite timeout, like a magic value that makes the counter never
expire. Since you seem to be using a signed value and 0 is already used
for no-timeout, -1

What kernel version are you using? We have exactly that :) If you set itto -1 then you get your infinite timeout.


--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: NFS hard semantics wanted: how to?

Reply via email to