On Fri, 6 May 2022 at 20:04, Andreas Dilger <[email protected]> wrote: > MOFED is usually preferred over in-kernel OFED, it is just tested and fixed a > lot more.
Fair enough, However is the 2.12.8-ib tree built with all the features? specifically https://downloads.whamcloud.com/public/lustre/lustre-2.12.8-ib/MOFED-4.9-4.1.7.0/el7/server/ If I compare the ib_srp module from 2.12 in-kernel [root@astrofs-oss3 ~]# find /lib/modules/`uname -r` -name ib_srp.ko.xz /lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz [root@astrofs-oss3 ~]# rpm -qf /lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz kernel-3.10.0-1160.49.1.el7_lustre.x86_64 [root@astrofs-oss3 ~]# modinfo ib_srp filename: /lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz license: Dual BSD/GPL description: InfiniBand SCSI RDMA Protocol initiator author: Roland Dreier retpoline: Y rhelversion: 7.9 srcversion: 1FB80E3A962EE7F39AD3959 depends: ib_core,scsi_transport_srp,ib_cm,rdma_cm intree: Y vermagic: 3.10.0-1160.49.1.el7_lustre.x86_64 SMP mod_unload modversions signer: CentOS Linux kernel signing key sig_key: FA:A3:27:4B:D9:17:36:F0:FD:43:6A:42:1B:6A:A4:FA:FE:D0:AC:FA sig_hashalgo: sha256 parm: srp_sg_tablesize:Deprecated name for cmd_sg_entries (uint) parm: cmd_sg_entries:Default number of gather/scatter entries in the SRP command (default is 12, max 255) (uint) parm: indirect_sg_entries:Default max number of gather/scatter entries (default is 12, max is 2048) (uint) parm: allow_ext_sg:Default behavior when there are more than cmd_sg_entries S/G entries after mapping; fails the request when false (default false) (bool) parm: topspin_workarounds:Enable workarounds for Topspin/Cisco SRP target bugs if != 0 (int) parm: prefer_fr:Whether to use fast registration if both FMR and fast registration are supported (bool) parm: register_always:Use memory registration even for contiguous memory regions (bool) parm: never_register:Never register memory (bool) parm: reconnect_delay:Time between successive reconnect attempts parm: fast_io_fail_tmo:Number of seconds between the observation of a transport layer error and failing all I/O. "off" means that this functionality is disabled. parm: dev_loss_tmo:Maximum number of seconds that the SRP transport should insulate transport layer errors. After this time has been exceeded the SCSI host is removed. Should be between 1 and SCSI_DEVICE_BLOCK_MAX_TIMEOUT if fast_io_fail_tmo has not been set. "off" means that this functionality is disabled. parm: ch_count:Number of RDMA channels to use for communication with an SRP target. Using more than one channel improves performance if the HCA supports multiple completion vectors. The default value is the minimum of four times the number of online CPU sockets and the number of completion vectors supported by the HCA. (uint) parm: use_blk_mq:Use blk-mq for SRP (bool) [root@astrofs-oss3 ~]# .. it all looks normal and capable of mounting our exascaler luns cf the one from 2.12.8-ib ============================================================================================================================================================================================= Package Arch Version Repository Size ============================================================================================================================================================================================= Installing: kernel x86_64 3.10.0-1160.49.1.el7_lustre lustre-2.12-mofed 50 M kmod-lustre-osd-ldiskfs x86_64 2.12.8_6_g5457c37-1.el7 lustre-2.12-mofed 469 k lustre x86_64 2.12.8_6_g5457c37-1.el7 lustre-2.12-mofed 805 k Installing for dependencies: kmod-lustre x86_64 2.12.8_6_g5457c37-1.el7 lustre-2.12-mofed 3.9 M kmod-mlnx-ofa_kernel x86_64 4.9-OFED.4.9.4.1.7.1 lustre-2.12-mofed 1.3 M lustre-osd-ldiskfs-mount x86_64 2.12.8_6_g5457c37-1.el7 lustre-2.12-mofed 15 k mlnx-ofa_kernel x86_64 4.9-OFED.4.9.4.1.7.1 lustre-2.12-mofed 108 k [root@astrofs-oss1 ~]# find /lib/modules/`uname -r` -name ib_srp.ko.xz /lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz [root@astrofs-oss1 ~]# rpm -qf /lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz kernel-3.10.0-1160.49.1.el7_lustre.x86_64 [root@astrofs-oss1 ~]# modinfo ib_srp filename: /lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/ulp/srp/ib_srp.ko version: 4.9-4.1.7 license: Dual BSD/GPL description: ib_srp dummy kernel module author: Alaa Hleihel retpoline: Y rhelversion: 7.9 srcversion: 9ACAA2F5216D9D9FC379EC8 depends: mlx_compat vermagic: 3.10.0-1160.49.1.el7_lustre.x86_64 SMP mod_unload modversions [root@astrofs-oss1 ~]# which doesn't seem actually be able to take any of the normal ib_srp parameters: [root@astrofs-oss1 ~]# modprobe ib_srp modprobe: ERROR: could not insert 'ib_srp': Unknown symbol in module, or unknown parameter (see dmesg) [ 238.194931] ib_srp: Unknown parameter `cmd_sg_entries' etc Any suggestions? I quickly tried installing another mlnx-ofa_kernel (from http://downloads.linux.hpe.com/SDR/repo/mlnx_ofed/RHEL/7.9/x86_64/4.9-4.1.7.0/) but the same.dummy module _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
