Awesome, thanks! Unfortunately the password reset site is not finding my UID. Maybe I never had access to the Lustre wiki. (I have so many accounts that sometimes my head spins.) I'm still willing to help. Is there a request password site?
Cheers, megan On Fri, Jun 26, 2020 at 8:54 PM Spitz, Cory James <[email protected]> wrote: > Megan, > > > > You wrote: > > PS. [I am willing to add/contribute to the > http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my > account for wiki editing has expired (at least the one I thought I had did > not work). > > > > Thank you for your offer! Did you try > http://wiki.lustre.org/Special:PasswordReset? If that didn’t work then I > think that you could email [email protected]. > > > > -Cory > > > > > > > > On 6/24/20, 3:33 PM, "lustre-discuss on behalf of Ms. Megan Larko" < > [email protected] on behalf of [email protected]> > wrote: > > > > On 22 Jun 2020 "guru.novice" wrote: > > Hi, all > We setup up a cluster use mlx4 and mlx5 driver mixed?all things goes well. > Later I find something in wiki > http://wiki.lustre.org/Infiniband_Configuration_Howto and > > http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html > which was > last edited on 2016. > So do i need to change lnet configuration described in this page ? > Or the problem has been resolved in new version (like 2.12.x) ? > Anymore where can i find more details ? > > Any suggestions would be appreciated. > Thanks? > > > > Hello guru.novice, > > Lustre 2.12.x has some nice LNet configuration abilities. The old > /etc/modprobe.d/ config files have been superceded by /etc/lnet.conf. An > install of Lustre 2.12.x provides a sample of this file (with the lines > commented out). Our experience has shown that not all lines are necessary; > edit to suit. > > > > The Lustre 2.12.x has Multi-Rail (MR) on by default so Lustre will attempt > to automatically find active and viable LNet paths to use. This should > have no issue with your mlx4/5 mix environment; we have some mixed IB and > eth that work. To explicitly use MR one may set "Multi-Rail: true" in the > "peer" NID section of the /etc/lnet.conf file. But that was not necessary > for us. We used a simple /etc/lnet.conf for MR systems: > > File stub: /etc/lnet.conf > > net: > > - net type: o2ib0 > > local NI(s): > > - interfaces: > > 0: ib0 > > - net type: o2ib777 > > local NI(s): > > - interfaces: > > 0: ib0:1 > > This allowed LNet to use any NID o2ib0 and o2ib777. > > > > Whatever is placed in the /etc/lnet.conf file is loaded into the kernel > modules used via the Lustre starting mechanism (CentOS uses > /usr/lib/systemd/system). Because we are choosing _not_ to use MR on a > different box, we explicitly defined the available routes in /etc/lnet.conf > using the lines: > > route: > > - net: tcp > > gateway: 10.10.10.101@o2ib11111 > > - net: tcp > > gateway: 10.10.10.102@o2ib1111 > > And so on up to 10.10.10.116@o2ib1111 > > > > In CentOS7, /usr/lib/systemd/system/lnet.service file is reproduced > below. (details: lustre-2.12.4-1 with Mellanox OFED version 4.7-1.0.0.1 > and kernel 3.10.957.27.2.el7) > > File lnet.service: > > [unit] > > Description=lnet management > > Requires=network-online.target > > After=network-online.target openibd.service rdma.service opa.service > > ConditionsPathExists=!/proc/sys/lnet/ > > > > [Service] > > Type=oneshot > > RemainAfterExit=true > > ExecStart=/sbin/modprobe lnet > > ExecStart=/usr/sbin/lnetctl lnet configure > > ExecStart=/usr/sbin/lnetctl set discover 0 <--Do NOT use this line if > you want MR function > > ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf <--The file with > router, credit and similar info > > ExecStart=/usr/sbin/lnetctl peer add --nid 10.10.10.[101-116]@o2ib11111 > --non_mr <--Omit non_rm if you want to use MR > > ExecStop=/usr/sbin/lustre_rmmod ptlrpc > > ExecStop=/usr/sbin/lnetctl lnet unconfigure > > ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs > > > > [Install] > > WantedBy=multi-user.target > > > > I hope this info can help you in the right direction. > > > > Cheers, > > megan > > PS. [I am willing to add/contribute to the > http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my > account for wiki editing has expired (at least the one I thought I had did > not work). > > Our site had issues with Multi-Rail "not socially distancing > appropriately" from other LNet networks so in our particular case we > disabled MR. (An entirely different experience.) ] >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
