Thanks. It feels like the theory is valid. Ideally to confirm I would need a way to manually force close the socklnd socket to force the other peer to re-established it. Could not find a way to do it for socket opened by kernel threads.
Le 19/02/2020 23:12, « NeilBrown » <[email protected]> a écrit : When LNet wants to send a message over a SOCKLND interface, ksocknal_launch_packet() is called. This calls ksocknal_launch_all_connections_locked() This loops over all "routes" to the "peer" to make sure they all have "connections". If it finds a route without a connection (returned by ksocknal_find_connectable_route_locked()) it calls ksocknal_launch_connection_locked() which adds the connection request to ksnd_connd_routes, and wakes up the connd. The connd thread will then make the connection. Hope that helps. NeilBrown On Wed, Feb 19 2020, Degremont, Aurelien wrote: > Thanks! That's really interesting. > Do you have a code pointer that could show where the code will establish this connection if missing? > > Le 18/02/2020 23:34, « NeilBrown » <[email protected]> a écrit : > > > It is not true that: > LNET will established connections only if asked for by upper layers. > > or at least, not in the sense that the upper layers ask for a > connection. > Lustre knows nothing about connections. Even LNet doesn't really know > about connections. It is only at the socklnd level that connections mean > much. > > Lustre and LNet are message-passing protocols. > Lustre asks LNet to send a message to a given peer, and gives some > details of the sort of reply to expect. > LNet chooses a route and thus a network interface, and asked the LND to > send the message. > The socklnd LND will see if it already has a TCP connection. If it > does, it will use it. If not, it will create one. > > So yes : it is exactly: > possible that the server in this case opens the connection itself > without waiting for the client to reconnect? > > NeilBrown > > > On Tue, Feb 18 2020, Aurelien Degremont wrote: > > > Thanks for your reply. > > I think I have a good enough understanding of LNET itself. My question was more about how LNET is being used by Lustre itself. > > > > LNET will established connections only if asked for by upper layers. > > When I was talking about client and server, I was talking about how Lustre was using it. > > > > As far as I understood, Lustre server only contact clients when they need to send LDLM callbacks. > > They do so through the socket already opened by the client (reverse import). > > What happened if the socket is closed is what I'm not sure. I though the server is rather waiting for the client to reconnect and if not, is more or less evicting it. > > Could it be possible that the server in this case opens the connection itself without waiting for the client to reconnect? > > > > > > Aurélien > > > > Le 18/02/2020 05:42, « NeilBrown » <[email protected]> a écrit : > > > > > > LNet is a peer-to-peer protocol, it has no concept of client and server. > > If one host needs to send a message to another but doesn't already have > > a connection, it creates a new connection. > > I don't yet know enough specifics of the lustre protocol to be certain > > of the circumstances when a lustre server will need to initiate a message > > to a client, but I imagine that recalling a lock might be one. > > > > I think you should assume that any LNet node might receive a connection > > from any other LNet node (for which they share an LNet network), and > > that the connection could come from any port between 512 and 1023 > > (LNET_ACCEPTOR_MIN_PORT to LNET_ACCEPTOR_MAX_PORT). > > > > NeilBrown > > > > > > > > On Mon, Feb 17 2020, Degremont, Aurelien wrote: > > > > > Hi all, > > > > > > From what I've understood so far, LNET listens on port 988 by default and peers connect to it using 1021-1023 TCP ports as source ports. > > > At Lustre level, servers listen on 988 and clients connect to them using the same source ports 1021-1023. > > > So only accepting connections to port 988 on server side sounded pretty safe to me. However, I've seen connections from 1021-1023 to 988, from server hosts to client hosts sometimes. > > > I can't understand what mechanism could trigger these connections. Did I miss something? > > > > > > Thanks > > > > > > Aurélien > > > > > > _______________________________________________ > > > lustre-discuss mailing list > > > [email protected] > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
