Running out of ideas.  I also searched old messages on this distro and on 
Google and found an unanswered questions from Aug, 2020 - 
https://www.mail-archive.com/[email protected]/msg16346.html


Hello Laura,
I tried your recommendation of passing mlnx_add_kernel_support.sh --kmp , my 
steps are below,  but I still get the same ksym error for lustre clients.   
Also I see that the kmp support is still not enabled (may be KMP support is 
only available on Redhat & SUSE, but not CentOS, Oracle Linux etc – based on 
this link:  
https://docs.mellanox.com/display/MLNXOFEDv461000/Installing+Mellanox+OFED )

Step1:  On build server:
./mlnx_add_kernel_support.sh --make-tgz --verbose --yes --kernel 
3.10.0-1160.15.2.el7_lustre.x86_64 --kernel-sources 
/usr/src/kernels/3.10.0-1160.15.2.el7_lustre.x86_64 --tmpdir /tmp --distro 
ol7.9 --mlnx_ofed /root/MLNX_OFED_LINUX-5.3-1.0.0.1-ol7.9-x86_64 –kmp
….
Detected MLNX_OFED_LINUX-5.3-1.0.0.1
….
Building MLNX_OFED_LINUX RPMS . Please wait...
….

Running MLNX_OFED_SRC-5.3-1.0.0.1/install.pl --tmpdir /tmp/mlnx_iso.7168_logs 
--kernel-only --kernel 3.10.0-1160.15.2.el7_lustre.x86_64 --kernel-sources 
/usr/src/kernels/3.10.0-1160.15.2.el7_lustre.x86_64 --builddir 
/tmp/mlnx_iso.7168 --build-only --distro ol7.9 --bump-kmp-version 202105221109
….
Creating metadata-rpms for 3.10.0-1160.15.2.el7_lustre.x86_64 ...
Created /tmp/MLNX_OFED_LINUX-5.3-1.0.0.1-ol7.9-x86_64-ext.tgz

Then:
./mlnxofedinstall --kernel 3.10.0-1160.15.2.el7_lustre.x86_64 --kernel-sources 
/usr/src/kernels/3.10.0-1160.15.2.el7_lustre.x86_64 --add-kernel-support 
--skip-repo --skip-distro-check --distro ol7.9 –kmp
…..
Installation finished successfully.
…
Updating / installing...
   1:mlnx-fw-updater-5.3-1.0.0.1      ################################# [100%]
Failed to update Firmware.
See /tmp/MLNX_OFED_LINUX.235507.logs/fw_update.log
…
To load the new driver, run:
/etc/init.d/openibd restart

Ran /etc/init.d/openibd restart
Unloading HCA driver:                                      [  OK  ]
Loading HCA driver and Access Layer:                       [  OK  ]



Step2:  On build server:  Create Lustre client package
./configure --disable-server --enable-client \
--with-linux=/usr/src/kernels/*_lustre.x86_64 \
--with-o2ib=/usr/src/ofa_kernel/default

make rpms


Step3:  On Lustre client node:  Install MOFED
Untar the MOFED package from Step1.
Run mlnxofedinstall   (I tried running with and without --kmp ,  but same ksym 
error).


  1.  Passing --kmp parameter
mlnxofedinstall --force --kernel 3.10.0-1160.15.2.el7_lustre.x86_64  
--kernel-sources /usr/src/kernels/3.10.0-1160.15.2.el7_lustre.x86_64 
--skip-distro-check --distro ol7.9 --kmp

  1.  Not Passing --kmp parameter
mlnxofedinstall --force --kernel 3.10.0-1160.15.2.el7_lustre.x86_64  
--kernel-sources /usr/src/kernels/3.10.0-1160.15.2.el7_lustre.x86_64 
--skip-distro-check --distro ol7.9



Step4:  On Lustre client node:

yum localinstall lustre-client-2.14.51-1.el7.x86_64.rpm 
kmod-lustre-client-2.14.51-1.el7.x86_64.rpm
……
Error: Package: kmod-lustre-client-2.14.51-1.el7.x86_64 
(/kmod-lustre-client-2.14.51-1.el7.x86_64)
           Requires: ksym(__ib_create_cq) = 0x1bb05802
Error: Package: kmod-lustre-client-2.14.51-1.el7.x86_64 
(/kmod-lustre-client-2.14.51-1.el7.x86_64)
           Requires: ksym(rdma_listen) = 0xf6bd553e
…..

I tried 3 different scenarios:

  1.  Copied the lustre client rpms from build server to luster client node and 
ran above command -   it failed
  2.  On lustre client node – create lustre package after running step 3a 
(mlnxofedinstall command with  --kmp)
  3.  On lustre client node – create lustre package after running step 3b 
(mlnxofedinstall command without  --kmp)


Any other suggestions ?


Sidenote:
For lustre server – I found this workaround, not sure yet if it will create 
issues once I mount and run Lustre.
Previously, I was installing lustre on Lustre servers using below command and 
getting ksym errors

sudo yum install lustre-tests -y

But if I use, the below, the install works:
rpm -ivh --nodeps lustre-2.14.51-1.el7.x86_64.rpm  
kmod-lustre-2.14.51-1.el7.x86_64.rpm 
kmod-lustre-osd-ldiskfs-2.14.51-1.el7.x86_64.rpm 
lustre-osd-ldiskfs-mount-2.14.51-1.el7.x86_64.rpm 
lustre-resource-agents-2.14.51-1.el7.x86_64.rpm


I have tested “modprobe lnet” and “lnetctl net add”,  MGS/MGT mount works,  MDT 
mount fails with “Invalid filesystem option set: 
dirdata,uninit_bg,^extents,dir_nlink,quota,project,huge_file,ea_inode,large_dir,flex_bg”



Thanks,
Pinkesh Valdria


From: Pinkesh Valdria <[email protected]>
Date: Friday, May 21, 2021 at 6:04 PM
To: "[email protected]" <[email protected]>
Subject: MOFED & Lustre 2.14.51 - install fails with dependency failure related 
to ksym/MOFED

Sorry for a long email,  wanted to make sure I share enough details for 
community to provide guidance.   I am building all lustre packages for Oracle 
Linux7.9-RHCK and MOFED: 5.3-1.0.0.1 using steps described here:  
https://wiki.lustre.org/Compiling_Lustre

Oracle Linux 7.9 – Kernel:  3.10.0-1160.15.2.el7.x86_64

I was able to create the below RPM packages successfully using a node which has 
same OS and kernel version and MOFED version and MLNX CX-5 card,  but when I 
try to install them on my lustre nodes, I get a dependency failure related to 
ksym/MOFED packages (more details below).


  1.  LDISKFS and Patching the Linux Kernel
  2.  MOFED rpms
  3.  Lustre server rpms
  4.  Lustre client rpms

After all RPMs were created, I created a local repo and added to all Lustre 
nodes:


cat > /etc/yum.repos.d/lustre.repo << EOF

[hpddLustreserver]

name=OL-Lustre-Server

baseurl=file:///home/opc/releases/lustre-server/

gpgcheck=0



[e2fsprogs]

name=CentOS- - Ldiskfs

baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/

gpgcheck=0



[hpddLustreclient]

name=OL-Lustre-Client

baseurl=file:///home/opc/releases/lustre-client/

gpgcheck=0



EOF



MOFED is installed and configured on those nodes and was able to validate using 
IMB-MPI1 pingpong test.
show_gids
mlx5_0 1              2              0000:0000:0000:0000:0000:ffff:c0a8:a985    
        192.168.169.133              v1           enp94s0f0


Dependency failure :   On OSS nodes, I ran the below to install all Lustre 
packages:

sudo yum install lustre-tests -y

sudo yum install -y lustre-tests
…
--> Running transaction check
---> Package lustre-tests.x86_64 0:2.14.51-1.el7 will be installed
--> Processing Dependency: lustre-devel = 2.14.51 for package: 
lustre-tests-2.14.51-1.el7.x86_64
…
--> Processing Dependency: liblnetconfig.so.4()(64bit) for package: 
lustre-tests-2.14.51-1.el7.x86_64
--> Running transaction check
---> Package kmod-lustre.x86_64 0:2.14.51-1.el7 will be installed
…..
….
---> Package libcom_err.x86_64 0:1.45.4-3.0.5.el7 will be updated
…
---> Package libss.x86_64 0:1.46.2.wc1-0.el7 will be an update
--> Finished Dependency Resolution
Error: Package: kmod-lustre-2.14.51-1.el7.x86_64 (hpddLustreserver)
           Requires: ksym(ib_map_mr_sg) = 0xcd1ffb73
Error: Package: kmod-lustre-2.14.51-1.el7.x86_64 (hpddLustreserver)
           Requires: ksym(rdma_resolve_route) = 0xc2064869
….
…. All ib/rdma related errors similar to above for kmod-lustre.x
….
Error: Package: kmod-lustre-2.14.51-1.el7.x86_64 (hpddLustreserver)
           Requires: ksym(ib_destroy_cq_user) = 0x5671830b
You could try using --skip-broken to work around the problem
** Found 3 pre-existing rpmdb problem(s), 'yum check' output follows:
oracle-cloud-agent-1.11.1-5104.el7.x86_64 is a duplicate with 
oracle-cloud-agent-1.8.2-3843.el7.x86_64
rdma-core-devel-52mlnx1-1.53100.x86_64 has missing requires of 
pkgconfig(libnl-3.0)
rdma-core-devel-52mlnx1-1.53100.x86_64 has missing requires of 
pkgconfig(libnl-route-3.0)
[opc@inst-dwnv3-topical-goblin ~]$





RPMS from:  LDISKFS and Patching the Linux Kernel
ls lustre-kernel/RPMS/

  *   bpftool-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   bpftool-debuginfo-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-debug-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-debug-debuginfo-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-debug-devel-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-debuginfo-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-debuginfo-common-x86_64-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-devel-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-headers-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-tools-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-tools-debuginfo-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-tools-libs-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   kernel-tools-libs-devel-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   perf-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   perf-debuginfo-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   python-perf-3.10.0-1160.15.2.el7_lustre.x86_64.rpm
  *   python-perf-debuginfo-3.10.0-1160.15.2.el7_lustre.x86_64.rpm


MOFED rpms

Steps followed:
Download from MLNX site the source:  MLNX_OFED_SRC-5.3-1.0.0.1.tgz
tar -zvxf $HOME/MLNX_OFED_SRC-5.3-1.0.0.1.tgz
cd MLNX_OFED_SRC-5.3-1.0.0.1/
./install.pl --build-only --kernel-only \
--kernel 3.10.0-1160.15.2.el7.x86_64 \
--kernel-sources /usr/src/kernels/3.10.0-1160.15.2.el7.x86_64

cp RPMS/*/*/*.rpm  $HOME/releases/mofed

Question:  I am passing regular kernel (3.10.0-1160.15.2.el7.x86_64) and its 
source (not Lustre patched kernel)  as input to MOFED install command above,  I 
hope that is correct.



  *   kernel-mft-4.16.3-12.kver.3.10.0_1160.15.2.el7.x86_64.x86_64.rpm
  *   knem-1.1.4.90mlnx1-OFED.5.1.2.5.0.1.ol7u9.x86_64.rpm
  *   
knem-modules-1.1.4.90mlnx1-OFED.5.1.2.5.0.1.kver.3.10.0_1160.15.2.el7.x86_64.x86_64.rpm
  *   
mlnx-nfsrdma-5.3-OFED.5.3.0.3.8.1.kver.3.10.0_1160.15.2.el7.x86_64.x86_64.rpm
  *   
mlnx-nfsrdma-debuginfo-5.3-OFED.5.3.0.3.8.1.kver.3.10.0_1160.15.2.el7.x86_64.x86_64.rpm
  *   mlnx-ofa_kernel-5.3-OFED.5.3.1.0.0.1.ol7u9.x86_64.rpm
  *   mlnx-ofa_kernel-debuginfo-5.3-OFED.5.3.1.0.0.1.ol7u9.x86_64.rpm
  *   mlnx-ofa_kernel-devel-5.3-OFED.5.3.1.0.0.1.ol7u9.x86_64.rpm
  *   
mlnx-ofa_kernel-modules-5.3-OFED.5.3.1.0.0.1.kver.3.10.0_1160.15.2.el7.x86_64.x86_64.rpm
  *   ofed-scripts-5.3-OFED.5.3.1.0.0.x86_64.rpm


Lustre Server packages

./configure --enable-server \
--with-linux=/usr/src/kernels/*_lustre.x86_64 \
--with-o2ib=/usr/src/ofa_kernel/default

make rpms



  *   kmod-lustre-2.14.51-1.el7.x86_64.rpm
  *   kmod-lustre-osd-ldiskfs-2.14.51-1.el7.x86_64.rpm
  *   kmod-lustre-tests-2.14.51-1.el7.x86_64.rpm
  *   lustre-2.14.51-1.el7.x86_64.rpm
  *   lustre-2.14.51-1.src.rpm
  *   lustre-debuginfo-2.14.51-1.el7.x86_64.rpm
  *   lustre-devel-2.14.51-1.el7.x86_64.rpm
  *   lustre-iokit-2.14.51-1.el7.x86_64.rpm
  *   lustre-osd-ldiskfs-mount-2.14.51-1.el7.x86_64.rpm
  *   lustre-resource-agents-2.14.51-1.el7.x86_64.rpm
  *   lustre-tests-2.14.51-1.el7.x86_64.rpm


Lustre Client packages

./configure --disable-server --enable-client \
--with-linux=/usr/src/kernels/*_lustre.x86_64 \
--with-o2ib=/usr/src/ofa_kernel/default

make rpms


  *   kmod-lustre-client-2.14.51-1.el7.x86_64.rpm
  *   kmod-lustre-client-tests-2.14.51-1.el7.x86_64.rpm
  *   lustre-2.14.51-1.src.rpm
  *   lustre-client-2.14.51-1.el7.x86_64.rpm
  *   lustre-client-debuginfo-2.14.51-1.el7.x86_64.rpm
  *   lustre-client-devel-2.14.51-1.el7.x86_64.rpm
  *   lustre-client-tests-2.14.51-1.el7.x86_64.rpm
  *   lustre-iokit-2.14.51-1.el7.x86_64.rpm




_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to