I can't comment on the LNet peer discovery part, but I would definitely not tecommend to leave the lnet_transaction_timeout that low for normal usage. This can cause messages to be dropped while the server is processing them and introduce failures needlessly.
Cheers, Andreas > On Oct 26, 2023, at 09:48, Bertschinger, Thomas Andrew Hjorth via > lustre-discuss <[email protected]> wrote: > > Hello, > > Recently we had an OSS node down for an extended period with hardware > problems. While the node was down, mounting lustre on a client took an > extremely long time to complete (20-30 minutes). Once the fs is mounted, all > operations are normal and there isn't any noticeable impact from the absent > node. > > While the client is mounting, the client's debug log shows entries like this > slowly going by: > > 00000020:00000080:87.0:1698333195.993098:0:3801046:0:(obd_config.c:1384:class_process_config()) > processing cmd: cf005 > 00000020:00000080:87.0:1698333195.993099:0:3801046:0:(obd_config.c:1396:class_process_config()) > adding mapping from uuid 10.1.2.3@o2ib to nid 0x500000abcd123 (10.1.2.4@o2ib) > > and there is a "llog_process_th" kernel thread hanging in > lnet_discover_peer_locked(). > > We have peer discovery enabled on our clients, but disabling peer discovery > on a client causes the mount to complete quickly. Also, once the down OSS was > fixed and powered back on, mounting completed normally again. > > We also found that reducing the following timeout sped up the mount by a > factor of ~10: > > $ lnetctl set transaction_timeout 5 # was 50 originally > > Is such a dramatic slowdown normal in this situation? Is there any fix (aside > from disabling peer discovery or tuning down the timeout) that could speed up > mounts in case we have another OSS down in the future? > > Lustre version (server and client): 2.15.3 > > Thanks, > Thomas Bertschinger > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
