Hi, I would like to know if the lustre client software is compatible with the ppc64le architecture and Mellanox Infiniband? I think is a problem between lustre and infiniband.

 

I want to join a node IBM Power System Power9 - AC922 to an existing lustre server (Intel servers), I built the lustre cliente software from source and installed succesfully, but I cant join this node to an existing lustre service.

 

Server Node (client)
  • IBM Power System 9 - AC922
  • Red Hat Enterprise Linux Server release 7.5 (Alternate)
  • Linux SinergiAC922 4.14.0-49.13.1.el7a.ppc64le #1 SMP Mon Aug 27 07:37:11 EDT 2018 ppc64le ppc64le ppc64le GNU/Linux
  • Mellanox Driver Version: 4.5-1.0.1
  • Lustre Client 2.12.58
  • Compilation: ./configure --disable-server --disable-tests --with-o2ib=/usr/src/ofa_kernel/default

dmesg log:

 

[163444.797346] Lustre: Lustre: Build Version: 2.12.58_145_gfcf219d
[163445.007000] LNet: Using FastReg for registration
[163445.008017] LNet: Added LNI my_ip_address@o2ib [8/256/0/180]

[163460.523709] LNetError: 17267:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni lustre_server_address@o2ib added to recovery queue. Health = 900
[163460.523775] LNetError: 17267:0:(lib-msg.c:481:lnet_handle_local_failure()) ni my_ip_address@o2ib added to recovery queue. Health = 900

 

messages log:

 

Sep 26 11:37:02 AC922 kernel: LNetError: 1404:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni lustre_server_address@o2ib added to recovery queue. Health = 900
Sep 26 11:37:02 SinergiAC922 kernel: LNetError: 1404:0:(lib-msg.c:481:lnet_handle_local_failure()) ni my_ip_address@o2ib added to recovery queue. Health = 900
Sep 26 11:37:08 AC922 kernel: LustreError: 73939:0:(mgc_request.c:250:do_config_log_add()) MGClustre_server_address@o2ib: failed processing log, type 1: rc = -5
Sep 26 11:37:16 AC922 kernel: LustreError: 73949:0:(mgc_request.c:598:do_requeue()) failed processing log: -5
Sep 26 11:37:39 AC922 kernel: LustreError: 15c-8: MGClustre_server_address@o2ib: Confguration from log testfs-client failed from MGS -5. Communication error between node & MGS, a bad configuration, or other errors. See syslog for more info
Sep 26 11:37:39 AC922 kernel: Lustre: Unmounted testfs-client
Sep 26 11:37:39 AC922 kernel: LustreError: 73939:0:(obd_mount.c:1669:lustre_fill_super()) Unable to mount  (-5)

 
 
--
 
Americo Ojeda
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to