Hi Lustre users, I'm looking for a bit of a sanity check here before i go down this path.
I've been dealing with a communication problem over lnet that triggers under some conditions for one for our clusters after upgrading. I thought we'd solved it by disabling LNET multi-rail but that doesn't appear to be the case. Here's the report: https://jira.whamcloud.com/browse/LU-18534 I'd like to try switching from ko2iblnd to ksocklnd. As these are data/scratch file systems for this cluster, and the cluster also accesses other file systems that are used more widely and working normally and will continue using @o2ib, I will need to set up the cluster clients with both @o2ib and @tcp interfaces on their same infiniband devices. Here is what I'm thinking of doing: - set up the cluster nodes with two NIDs using the same ip (eg: 172.16.23.100@o2ib and 172.16.23.100@tcp) - change the NIDs of the scratch and data file system configurations to @tcp by using replace_nids on the MGS - let the clients continue mounting other lustre file systems via @o2ib but update them to access scratch and data via @tcp NIDs. Does this sound like something that should work or is it not worth attempting? Thanks, Jesse _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org