Thanks for the response, I've used just defaults on my initial attempt, but yes I was using o2ib as this is implemented in all the physical servers. If I need to use a different module as you indicate, how would I do that? via /etc/modprobe.d/lnet.conf or in another file?
Regards Sid Young W: https://off-grid-engineering.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of lustre-discuss digest..." > Today's Topics: > > 1. Re: Lustre 2.15.5 in a Virtual Machine (Michael DiDomenico) > > > > ---------- Forwarded message ---------- > From: Michael DiDomenico <[email protected]> > To: > Cc: lustre-discuss <[email protected]> > Bcc: > Date: Fri, 25 Oct 2024 12:47:07 -0400 > Subject: Re: [lustre-discuss] Lustre 2.15.5 in a Virtual Machine > lustre in a vm certainly works as i have many running under vmware and > mounting lustre > > but i'm a little confused on your message. are you trying to bind the > lustre client via infiniband or tcp/ip? if the later (assumed based > on the ens nic prefix), you need to use the ksocklnd not the kiblnd > module > > > On Thu, Oct 24, 2024 at 3:17 AM Sid Young <[email protected]> wrote: > > > > G'Day all, > > > > I'm trying to get lustre to bind to a 100G Mellanox card shared between > VM's but it fails with the following errors in dmeg: > > > > [ 406.474952] Lustre: Lustre: Build Version: 2.15.5 > > [ 406.604652] LNetError: 92384:0:(o2iblnd.c:2838:kiblnd_dev_failover()) > Failed to bind ens224:10.140.93.72 to device(0000000000000000): -19 > > [ 406.604704] LNetError: 92384:0:(o2iblnd.c:3355:kiblnd_startup()) > ko2iblnd: Can't initialize device: rc = -19 > > [ 407.655888] LNetError: 105-4: Error -100 starting up LNI o2ib > > [ 407.656729] LustreError: 92384:0:(events.c:639:ptlrpc_init_portals()) > network initialisation failed > > [ 559.741846] LNetError: > 92993:0:(lib-move.c:2255:lnet_handle_find_routed_path()) peer > 10.140.93.42@o2ib has no available nets > > [ 594.480161] LNetError: 93225:0:(o2iblnd.c:2838:kiblnd_dev_failover()) > Failed to bind ens224:10.140.93.72 to device(0000000000000000): -19 > > [ 594.480213] LNetError: 93225:0:(o2iblnd.c:3355:kiblnd_startup()) > ko2iblnd: Can't initialize device: rc = -19 > > [ 595.498493] LNetError: 105-4: Error -100 starting up LNI o2ib > > [ 707.825127] LNetError: 93691:0:(o2iblnd.c:2838:kiblnd_dev_failover()) > Failed to bind ens224:10.140.93.72 to device(0000000000000000): -19 > > [ 707.825182] LNetError: 93691:0:(o2iblnd.c:3355:kiblnd_startup()) > ko2iblnd: Can't initialize device: rc = -19 > > [ 708.843933] LNetError: 105-4: Error -100 starting up LNI o2ib > > [ 789.779769] LNetError: 93930:0:(o2iblnd.c:2838:kiblnd_dev_failover()) > Failed to bind ens224:10.140.93.72 to device(0000000000000000): -19 > > [ 789.779820] LNetError: 93930:0:(o2iblnd.c:3355:kiblnd_startup()) > ko2iblnd: Can't initialize device: rc = -19 > > [ 790.828974] LNetError: 105-4: Error -100 starting up LNI o2ib > > [root@hpc-vm-02 2.15.5]# > > > > The VM has two network interfaces ens192 and ens224 both are operational > with TCP traffic. > > > > /etc/modprobe.d/lnet.conf > > options lnet networks="o2ib(ens224) 10.140.93.*" > > > > [root@hpc-vm-02 2.15.5]# lnetctl net add --net o2ib --if ens224 > > add: > > - net: > > errno: -100 > > descr: "cannot add network: Network is down" > > [root@hpc-vm-02 2.15.5]# > > > > > > Any ideas where I might look? > > Are virtual machines even supported with Lustre > > OS is VMWare 7U3 on HP DL385 with 256 cores and 512GB RAM. > >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
