Hi Lustre users,

I'm looking for a bit of a sanity check here before i go down this path.

I've been dealing with a communication problem over lnet that triggers under 
some conditions for one for our clusters after upgrading. I thought we'd solved 
it by disabling LNET multi-rail but that doesn't appear to be the case. Here's 
the report:

https://jira.whamcloud.com/browse/LU-18534

I'd like to try switching from ko2iblnd to ksocklnd. As these are data/scratch 
file systems for this cluster, and the cluster also accesses other file systems 
that are used more widely and working normally and will continue using @o2ib, I 
will need to set up the cluster clients with both @o2ib and @tcp interfaces on 
their same infiniband devices.

Here is what I'm thinking of doing:

- set up the cluster nodes with two NIDs using the same ip (eg: 
172.16.23.100@o2ib and 172.16.23.100@tcp)
- change the NIDs of the scratch and data file system configurations to @tcp by 
using replace_nids on the MGS
- let the clients continue mounting other lustre file systems via @o2ib but 
update them to access scratch and data via @tcp NIDs.

Does this sound like something that should work or is it not worth attempting?

Thanks,
Jesse
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to