Hello

Andreas Dilger a écrit :
> On May 3, 2011, at 13:41, Nathan Rutman wrote:
>   
>> On May 3, 2011, at 10:09 AM, DEGREMONT Aurelien wrote:
>>     
>>> Correct me if I'm wrong, but when I'm looking at Lustre manual, it said 
>>> that client is adapting its timeout, but not the server. I'm understood 
>>> that server->client RPC still use the old mechanism, especially for our 
>>> case where it seems server is revoking a client lock (ldlm_timeout is 
>>> used for that?) and client did not respond.
>>>       
>> Server and client cooperate together for the adaptive timeouts.  I don't 
>> remember which bug the ORNL settings were in, maybe 14071, bugzilla's not 
>> responding at the moment.  But a big question here is why 25315 seconds for 
>> a callback - that's well beyond anything at_max should allow...
>>     
>
> I assume that the 25315s is from a bug (fixed in 1.8.5 I think, not sure if 
> it was ported to 2.x) that calculated the wrong time when printing this error 
> message for LDLM lock timeouts.
>   
I did not find the bug for that.
>>> I forgot to say that we have LNET routers also involved for some cases.
>>>       
> If there are routers they can cause dropped RPCs from the server to the 
> client, and the client will be evicted for unresponsiveness even though it is 
> not at fault.  At one time Johann was working on a patch (or at least 
> investigating) the ability to have servers resend RPCs before evicting 
> clients.  The tricky part is that you don't want to send 2 RPCs each with 1/2 
> the timeout interval, since that may reduce stability instead of increasing 
> it.
>   
How can I track those dropped RPCs on routers?
Is this an expected behaviour? How could I protect my filesystem from 
that? If I increase the timeout this won't change anything if 
client/server do not re-send their RPC.

> I think the bugzilla bug was called "limited server-side resend" or similar, 
> filed by me several years ago.
>   
Did not find either :)

Aurélien
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to