Hello Andreas Dilger a écrit : > On May 3, 2011, at 13:41, Nathan Rutman wrote: > >> On May 3, 2011, at 10:09 AM, DEGREMONT Aurelien wrote: >> >>> Correct me if I'm wrong, but when I'm looking at Lustre manual, it said >>> that client is adapting its timeout, but not the server. I'm understood >>> that server->client RPC still use the old mechanism, especially for our >>> case where it seems server is revoking a client lock (ldlm_timeout is >>> used for that?) and client did not respond. >>> >> Server and client cooperate together for the adaptive timeouts. I don't >> remember which bug the ORNL settings were in, maybe 14071, bugzilla's not >> responding at the moment. But a big question here is why 25315 seconds for >> a callback - that's well beyond anything at_max should allow... >> > > I assume that the 25315s is from a bug (fixed in 1.8.5 I think, not sure if > it was ported to 2.x) that calculated the wrong time when printing this error > message for LDLM lock timeouts. > I did not find the bug for that. >>> I forgot to say that we have LNET routers also involved for some cases. >>> > If there are routers they can cause dropped RPCs from the server to the > client, and the client will be evicted for unresponsiveness even though it is > not at fault. At one time Johann was working on a patch (or at least > investigating) the ability to have servers resend RPCs before evicting > clients. The tricky part is that you don't want to send 2 RPCs each with 1/2 > the timeout interval, since that may reduce stability instead of increasing > it. > How can I track those dropped RPCs on routers? Is this an expected behaviour? How could I protect my filesystem from that? If I increase the timeout this won't change anything if client/server do not re-send their RPC.
> I think the bugzilla bug was called "limited server-side resend" or similar, > filed by me several years ago. > Did not find either :) Aurélien _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
