Hi Cory

I'm with 2.10 there and idle disconnect is not available in 2.10.
The steps I'm thinking of so far are:

1. - Client connects, opens a TCP socket to server
2. - Client acquires a LDLM lock
3. - TCP connection gets broken
4. - Soon after a conflicting lock is enqueued. Server needs to cancel the lock 
from Client.
5. - Server tries to send a LDLM callback through LNET
6. - LNET initiates a TCP connection in the other way, as the existing socket 
is no more there.

If time between 2. and 4. is long enough, either the client will have time to 
reconnect (next ping?) or either the server will evict the client because it 
did not get any message from it for a while.

This is difficult to reproduce as I don't know how to force close the socket 
socklnd has opened.


Aurélien

Le 19/02/2020 18:19, « Spitz, Cory James » <[email protected]> a écrit :

    Hello, Aurélien.  I'm guessing that if you have modern Lustre then idle 
clients may disconnect, and so you might regularly see Lustre servers initiate 
the socket connection again.   I'm not sure how to show that that it is the 
case or not.  Perhaps someone else can chime in on whether that could be it and 
if so, how to prove it.
    
    -Cory
    
    
    On 2/19/20, 2:35 AM, "lustre-discuss on behalf of Degremont, Aurelien" 
<[email protected] on behalf of [email protected]> 
wrote:
    
        Thanks! That's really interesting.
        Do you have a code pointer that could show where the code will 
establish this connection if missing?
        
        Le 18/02/2020 23:34, « NeilBrown » <[email protected]> a écrit :
        
            
            It is not true that:
               LNET will established connections only if asked for by upper 
layers.
            
            or at least, not in the sense that the upper layers ask for a
            connection.
            Lustre knows nothing about connections.  Even LNet doesn't really 
know
            about connections. It is only at the socklnd level that connections 
mean
            much.
            
            Lustre and LNet are message-passing protocols.
            Lustre asks LNet to send a message to a given peer, and gives some
            details of the sort of reply to expect.
            LNet chooses a route and thus a network interface, and asked the 
LND to
            send the message.
            The socklnd LND will see if it already has a TCP connection.  If it
            does, it will use it.  If not, it will create one.
            
            So yes : it is exactly:
              possible that the server in this case opens the connection itself
              without waiting for the client to reconnect?
            
            NeilBrown
            
            
            On Tue, Feb 18 2020, Aurelien Degremont wrote:
            
            > Thanks for your reply.
            > I think I have a good enough understanding of LNET itself. My 
question was more about how LNET is being used by Lustre itself.
            >
            > LNET will established connections only if asked for by upper 
layers. 
            > When I was talking about client and server, I was talking about 
how Lustre was using it.
            >
            > As far as I understood, Lustre server only contact clients when 
they need to send LDLM callbacks.
            > They do so through the socket already opened by the client 
(reverse import).
            > What happened if the socket is closed is what I'm not sure. I 
though the server is rather waiting for the client to reconnect and if not, is 
more or less evicting it.
            > Could it be possible that the server in this case opens the 
connection itself without waiting for the client to reconnect?
            >
            >
            > Aurélien
            >
            > Le 18/02/2020 05:42, « NeilBrown » <[email protected]> a écrit :
            >
            >     
            >     LNet is a peer-to-peer protocol, it has no concept of client 
and server.
            >     If one host needs to send a message to another but doesn't 
already have
            >     a connection, it creates a new connection.
            >     I don't yet know enough specifics of the lustre protocol to 
be certain
            >     of the circumstances when a lustre server will need to 
initiate a message
            >     to a client, but I imagine that recalling a lock might be one.
            >     
            >     I think you should assume that any LNet node might receive a 
connection
            >     from any other LNet node (for which they share an LNet 
network), and
            >     that the connection could come from any port between 512 and 
1023
            >     (LNET_ACCEPTOR_MIN_PORT to LNET_ACCEPTOR_MAX_PORT).
            >     
            >     NeilBrown
            >     
            >     
            >     
            >     On Mon, Feb 17 2020, Degremont, Aurelien wrote:
            >     
            >     > Hi all,
            >     >
            >     > From what I've understood so far, LNET listens on port 988 
by default and peers connect to it using 1021-1023 TCP ports as source ports.
            >     > At Lustre level, servers listen on 988 and clients connect 
to them using the same source ports 1021-1023.
            >     > So only accepting connections to port 988 on server side 
sounded pretty safe to me. However, I've seen connections from 1021-1023 to 
988, from server hosts to client hosts sometimes.
            >     > I can't understand what mechanism could trigger these 
connections. Did I miss something?
            >     >
            >     > Thanks
            >     >
            >     > Aurélien
            >     >
            >     > _______________________________________________
            >     > lustre-discuss mailing list
            >     > [email protected]
            >     > 
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org 
            >     
            
        
        _______________________________________________
        lustre-discuss mailing list
        [email protected]
        http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org 
        
    
    

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to