Re: [lustre-discuss] Issue with High-Load Write Operations in Lustre Cluster

chenzufei--- via lustre-discuss Sun, 02 Mar 2025 19:52:13 -0800

The root cause is that the RoCE configuration on a client is incorrect, causing 
the business traffic to run on priority 0 (which should theoretically be on 
priority 3), thereby affecting corosync.

chenzu...@gmail.com

From: zufei chen
Date: 2024-11-25 22:42
To: lustre-discuss
Subject: Issue with High-Load Write Operations in Lustre Cluster
Dear Lustre Community,
I am encountering an issue with the Lustre high-availability component, 
Corosync, which experiences packet loss under high load, triggering fencing and 
powering down Lustre nodes. I am seeking advice on how to resolve this issue. 
Below are the details of our environment and the problem:
Environment:
Lustre version: 2.15.5
Physical machines: 11 machines, each with 128 CPU cores and 376GB of memory.
Virtualization: Each physical machine runs a KVM virtual machine with 20 cores 
and 128GB of memory, using Rocky Linux 8.10.
Lustre setup: Each VM has 2 MDTs (512GB each) and 16 OSTs (670GB each).
Configuration (/etc/modprobe.d/lustre.conf):
options lnet networks="o2ib(enp0s5f0np0)"
options libcfs cpu_npartitions=2
options ost oss_num_threads=512
options mds mds_num_threads=512
options ofd adjust_blocks_percent=11

Network: 100GB RDMA network.
Clients: 11 clients using vdbench to perform large file writes (total write 
bandwidth approximately 50GB).
Issue:
Under high load write operations, the Corosync component experiences packet 
loss. There is a probability that heartbeat loss triggers Pacemaker's fencing 
mechanism, which powers down the Lustre nodes.
Analysis Conducted:
CPU usage: The CPU utilization is not very high, but the cpu load is very high 
(reaching around 400).
Packet loss: There is packet loss observed when pinging between Lustre nodes.
Tuning ost_num_threads and mds_num_threads: Reducing these values reduced the 
system load and improved packet loss significantly, but it also led to a 
decrease in the Vdbench write bandwidth.
Network tuning: After adjusting net.ipv4.udp_mem (three times larger than 
default), packet loss improved, but it still persists.
sysctl -w net.ipv4.udp_mem="9217055 12289407 18434106"

Assistance Requested:
I would appreciate any suggestions from the community on how to resolve this 
issue effectively. If anyone has faced similar challenges, your insights would 
be especially valuable.
Thank you for your time and assistance. I look forward to your responses.
Best regards

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Issue with High-Load Write Operations in Lustre Cluster

Reply via email to