I did some communications test with tcpdump, if a use "telnet server_ip 988" a get some response from server, but when use "hctl ping server_ip@o2ib" I get nothing from tcpdump. I think the problem is with my infiniband communication.
some recomentation? On 10/8/19 10:04 AM, Americo Ojeda wrote: > Hi, I would like to know if the lustre client software is compatible > with the ppc64le architecture and Mellanox Infiniband? I think is a > problem between lustre and infiniband. > > I want to join a node IBM Power System Power9 - AC922 to an existing > lustre server (Intel servers), I built the lustre cliente software from > source and installed succesfully, but I cant join this node to an > existing lustre service. > > Server Node (client) > > IBM Power System 9 - AC922 > Red Hat Enterprise Linux Server release 7.5 (Alternate) > Linux SinergiAC922 4.14.0-49.13.1.el7a.ppc64le #1 SMP Mon Aug 27 > 07:37:11 EDT 2018 ppc64le ppc64le ppc64le GNU/Linux > Mellanox Driver Version: 4.5-1.0.1 > Lustre Client 2.12.58 > Compilation: ./configure --disable-server --disable-tests > --with-o2ib=/usr/src/ofa_kernel/default > > dmesg log: > > [163444.797346] Lustre: Lustre: Build Version: 2.12.58_145_gfcf219d > [163445.007000] LNet: Using FastReg for registration > [163445.008017] LNet: Added LNI my_ip_address@o2ib [8/256/0/180] > > [163460.523709] LNetError: > 17267:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni > lustre_server_address@o2ib added to recovery queue. Health = 900 > [163460.523775] LNetError: > 17267:0:(lib-msg.c:481:lnet_handle_local_failure()) ni > my_ip_address@o2ib added to recovery queue. Health = 900 > > messages log: > > Sep 26 11:37:02 AC922 kernel: LNetError: > 1404:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni > lustre_server_address@o2ib added to recovery queue. Health = 900 > Sep 26 11:37:02 SinergiAC922 kernel: LNetError: > 1404:0:(lib-msg.c:481:lnet_handle_local_failure()) ni my_ip_address@o2ib > added to recovery queue. Health = 900 > Sep 26 11:37:08 AC922 kernel: LustreError: > 73939:0:(mgc_request.c:250:do_config_log_add()) > MGClustre_server_address@o2ib: failed processing log, type 1: rc = -5 > Sep 26 11:37:16 AC922 kernel: LustreError: > 73949:0:(mgc_request.c:598:do_requeue()) failed processing log: -5 > Sep 26 11:37:39 AC922 kernel: LustreError: 15c-8: > MGClustre_server_address@o2ib: Confguration from log testfs-client > failed from MGS -5. Communication error between node & MGS, a bad > configuration, or other errors. See syslog for more info > Sep 26 11:37:39 AC922 kernel: Lustre: Unmounted testfs-client > Sep 26 11:37:39 AC922 kernel: LustreError: > 73939:0:(obd_mount.c:1669:lustre_fill_super()) Unable to mount (-5) > > > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- Consultar el aviso de privacidad en: http://www.sinergiasys.com/aviso-de-privacidad/ <http://www.sinergiasys.com/aviso-de-privacidad/>
pEpkey.asc
Description: application/pgp-keys
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
