Hi,

We have a tcp1 config, but lnet.conf looks like this:

net:
    - net type: tcp1
      local NI(s):
        - nid: <IP>@tcp1
          status: up
          interfaces:
              0: eth0

replace <IP> with NID IP. I guess you need "- net type" instead of just "- net".

Cheers,
Hans Henrik

On 17/09/2024 11.50, Steve Brasier wrote:
Hi.

I've got an /etc/lnet.conf on a RockyLinux 9.4 client running lustre 2.15.5-1.el9 which has this lnet.conf:

[root@stg-login-0 rocky]# cat /etc/lnet.conf
net:
    - net: tcp1
        interfaces:
            0: eth0

Running systemctl start lnet just hangs forever, with the syslog just showing
Sep 13 15:31:35 stg-login-0 systemd[1]: Starting lnet management...

and its actually the below which hangs:
[root@stg-login-0 rocky]# /usr/sbin/lnetctl import /etc/lnet.conf
i.e. module load and lnet configure work OK.

However it looks like it autoconfigured an interface on tcp (not tcp1):
[root@stg-login-0 rocky]# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 10.179.2.45@tcp
          status: up

So:
1. How can I debug this hanging please?

2. Do the client and server NIDs need to be in the same IPv4 subnet? I have a client NID of 10.179.2.45@tcp1 and a server NID of 10.167.128.1@tcp1, with IP routing between them such that icmp ping works between them, is that OK?

many thanks for any help!


http://stackhpc.com/
Please note I work Tuesday to Friday.

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to