Thanks for the replies, the nodes have multiple interfaces (four on compute nodes and 6 on the storage nodes), ens2f0 is the 100G Mellanox ConnectX5 card in slot 2 and they are all running 2.12.6 using the RPMS from the lustre site.
I will remove one of the network definition files and add the lnetctl --backup config to the /etc/lnet.conf.... i did try an export and noticed it barfs on some of the parameters but I did not try the --backup option, so it gives me a few options to experiment with minimising the config.... just a bit of trial and error I gather then the lustre.conf file is not needed, just the /etc/modprobe.d/lnet.conf and the /etc/lnet.conf. Sid Young > > ---------- Forwarded message ---------- > From: "Degremont, Aurelien" <[email protected]> > To: Sid Young <[email protected]>, lustre-discuss < > [email protected]> > Cc: > Bcc: > Date: Tue, 23 Feb 2021 08:47:27 +0000 > Subject: Re: [lustre-discuss] need to always manually add network after > reboot > > Hello > > > > If I understand correctly, you're telling that you have 2 configuration > files: > > > > /etc/modprobe.d/lnet.conf > > options lnet networks=tcp > > > > [root@hpc-oss-03 ~]# cat /etc/modprobe.d/lustre.conf > options lnet networks="tcp(ens2f0)" > options lnet ip2nets="tcp(ens2f0) 10.140.93.* > > > > That means you are declaring twice the "networks" option for "lnet" kernel > module. I don't know how 'modprobe' will behave regarding that. > > If you have a very simple configuration, where your nodes only have one > Ethernet interface "ens2f0", you only need the following lines, from the 3 > above: > > > > options lnet networks="tcp(ens2f0)" > > > > If this interface is the only Ethernet interface on your host, you don't > even need a network specific setup. By default, when loading Lustre, in the > absence of a network configuration, Lustre will automatically setup the > only ethernet interface to use it for "tcp". > > > > Aurélien > > > > > > *De : *lustre-discuss <[email protected]> au nom de > Sid Young via lustre-discuss <[email protected]> > *Répondre à : *Sid Young <[email protected]> > *Date : *mardi 23 février 2021 à 06:59 > *À : *lustre-discuss <[email protected]> > *Objet : *[EXTERNAL] [lustre-discuss] need to always manually add network > after reboot > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > > G'Day all, > > I'm finding that when I reboot any node in our new HPC, I need to keep > manually adding the network using lnetctl net add --net tcp --if ens2f0 > > Then I can do an lnetctl net show and see the tcp part active... > > > > I have options in /etc/modprobe.d/lnet.conf > > options lnet networks=tcp > > > > and > > > > [root@hpc-oss-03 ~]# cat /etc/modprobe.d/lustre.conf > options lnet networks="tcp(ens2f0)" > options lnet ip2nets="tcp(ens2f0) 10.140.93.* > > > > I've read the doco and tried to understand the correct parameters for a > simple Lustre config so this is what I worked out is needed... but I > suspect its still wrong. > > > > Any help appreciated :) > > > > > > > > Sid Young > > > > > > ---------- Forwarded message ---------- > From: Angelos Ching <[email protected]> > To: [email protected] > Cc: > Bcc: > Date: Tue, 23 Feb 2021 18:06:02 +0800 > Subject: Re: [lustre-discuss] need to always manually add network after > reboot > > Hi Sid, > > Notice that you are using lnetctl net add to add the lnet network, which > means you should be using a recent version of Lustre that depends on > /etc/lnet.conf for boot time lnet configuration. > > You can save the current lnet configuration using command: lnetctl export > --backup > /etc/lnet.conf (make a backup of the original file first if > required) > > On next boot, lnet.service will load your lnet configuration from the file. > > Or you can manually build lnet.conf as lnetctl seems to have occasion > problems with some of the fields exported by "lnetctl export --backup" > > Attaching my simple lnet.conf for your reference: > > # cat /etc/lnet.conf > ip2nets: > - net-spec: o2ib > ip-range: > 0: 10.2.8.* > - net-spec: tcp > ip-range: > 0: 10.5.9.* > route: > - net: o2ib > gateway: 10.5.9.25@tcp > hop: -1 > priority: 0 > - net: o2ib > gateway: 10.5.9.24@tcp > hop: -1 > priority: 0 > global: > numa_range: 0 > max_intf: 200 > discovery: 1 > drop_asym_route: 0 > > Best regards, > Angelos > > >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
