Here the failover is designed in such a way that the IP address moves (fails over) with OST and becomes active on the other server.
This is probably the source of your problem. I would suggest assigning unique IP addresses to each OSS. Chris Horn From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of Backer <backer.k...@gmail.com> Date: Tuesday, November 5, 2024 at 10:19 PM To: Backer via lustre-discuss <lustre-discuss@lists.lustre.org>, lustre-de...@lists.lustre.org <lustre-de...@lists.lustre.org> Subject: Re: [lustre-discuss] Lustre switching to loop back lnet interface when it is not desired Any ideas on how to avoid using 0@lo as failover_nids? Please see below. On Tue, 5 Nov 2024 at 12:34, Backer <backer.k...@gmail.com<mailto:backer.k...@gmail.com>> wrote: Hi, Mounting the Lustre file file system on the OSS. Some of the OSTs are locally attached to the OSS. The failover IP on the OST is "10.99.100.152". It is a local lnet on the OSS. However, when the client mounts it, the import automatically changes to 0@lo. It is undesirable here because when this OST fails over to another server, the client is still trying to connect to 0@lo while it is no longer on the same host. This makes the client fs mount hangs for ever. Here the failover is designed in such a way that the IP address moves (fails over) with OST and becomes active on the other server. How can I make the import pointing to the real IP and not the loopback? (so that the failover works) [oss000 ~]$ lfs df UUID 1K-blocks Used Available Use% Mounted on fs-MDT0000_UUID 29068444 25692 26422344 1% /mnt/fs[MDT:0] fs-OST0000_UUID 50541812 30160292 17743696 63% /mnt/fs[OST:0] fs-OST0001_UUID 50541812 29301740 18602248 62% /mnt/fs[OST:1] fs-OST0002_UUID 50541812 29356508 18547480 62% /mnt/fs[OST:2] fs-OST0003_UUID 50541812 8822980 39081008 19% /mnt/fs[OST:3] filesystem_summary: 202167248 97641520 93974432 51% /mnt/fs [oss000 ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 30G 0 30G 0% /dev tmpfs 30G 8.1M 30G 1% /dev/shm tmpfs 30G 25M 30G 1% /run tmpfs 30G 0 30G 0% /sys/fs/cgroup /dev/mapper/ocivolume-root 36G 17G 19G 48% / /dev/sdc2 1014M 637M 378M 63% /boot /dev/mapper/ocivolume-oled 10G 2.5G 7.6G 25% /var/oled /dev/sdc1 100M 5.1M 95M 6% /boot/efi tmpfs 5.9G 0 5.9G 0% /run/user/987 tmpfs 5.9G 0 5.9G 0% /run/user/0 /dev/sdb 49G 28G 18G 62% /fs-OST0001 /dev/sda 49G 29G 17G 63% /fs-OST0000 tmpfs 5.9G 0 5.9G 0% /run/user/1000 10.99.100.221@tcp1:/fs 193G 94G 90G 51% /mnt/fs [oss000 ~]$ sudo tunefs.lustre --dryrun /dev/sda checking for existing Lustre data: found Read previous values: Target: fs-OST0000 Index: 0 Lustre FS: fs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1 Permanent disk data: Target: fs-OST0000 Index: 0 Lustre FS: fs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1 exiting before disk write. [oss000 proc]# cat /proc/fs/lustre/osc/fs-OST0000-osc-ffff89c57672e000/import import: name: fs-OST0000-osc-ffff89c57672e000 target: fs-OST0000_UUID state: IDLE connect_flags: [ write_grant, server_lock, version, request_portal, max_byte_per_rpc, early_lock_cancel, adaptive_timeouts, lru_resize, alt_checksum_algorithm, fid_is_enabled, version_recovery, grant_shrink, full20, layout_lock, 64bithash, object_max_bytes, jobstats, einprogress, grant_param, lvb_type, short_io, lfsck, bulk_mbits, second_flags, lockaheadv2, increasing_xid, client_encryption, lseek, reply_mbits ] connect_data: flags: 0xa0425af2e3440078 instance: 39 target_version: 2.15.3.0 initial_grant: 8437760 max_brw_size: 4194304 grant_block_size: 4096 grant_inode_size: 32 grant_max_extent_size: 67108864 grant_extent_tax: 24576 cksum_types: 0xf7 max_object_bytes: 17592186040320 import_flags: [ replayable, pingable, connect_tried ] connection: failover_nids: [ 0@lo, 0@lo ] current_connection: 0@lo connection_attempts: 1 generation: 1 in-progress_invalidations: 0 idle: 36 sec rpcs: inflight: 0 unregistering: 0 timeouts: 0 avg_waittime: 2627 usec service_estimates: services: 1 sec network: 1 sec transactions: last_replay: 0 peer_committed: 0 last_checked: 0
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org