Have you run tests on the IB network to ensure that it is performing at the 
speed you expect?  If the network seems ok, it might be worth doing IOR tests 
with file per process to see what performance you can get.  In general, FPP 
tends to perform better than SSF.  I know that you have set your stripe size to 
match the block size in order to reduce contention, but FPP tests might still 
be useful.

--Rick


On 7/31/24, 10:53 PM, "lustre-discuss on behalf of Cong Le Duy" 
<lustre-discuss-boun...@lists.lustre.org 
<mailto:lustre-discuss-boun...@lists.lustre.org> on behalf of 
con...@innotech-vn.com <mailto:con...@innotech-vn.com>> wrote:


Hi all,


I am testing a Lustre system that includes 1 MGS, 2 MDS, and 8 OSS with 8 OSTs 
running RAID 6 (8d+2p). Each OST's performance is approximately 16 GB/s for 
WRITE and 33 GB/s for READ (measured with FIO test: blocksize=1m, iodepth=64, 
numjob=2 sequential test). The system has 16 clients. 
I am encountering issues with performance testing using IOR with the following 
options: 
``` 
mpirun --allow-run-as-root --mca pml ucx -x UCX_TLS=rc_mlx5,ud_mlx5,self -x 
UCX_NET_DEVICES=mlx5_0:1 --mca btl ^openib --hostfile mphost10 -np 
<number_of_process> -map-by node ior -w -r -b 2m -t 2m -C -s 4000 -k -e -o 
/lustre/testFS/ior/iortest ``` 


The stripe_count is set equal to the number of processes (overstriping), and 
the stripe_size is equal to the block size (2m). The issues I am facing are: 


1. Performance does not increase beyond 2 processes per client. With 1 client 
and 1 OST, I achieve approximately 2 GB/s for WRITE. With 2 clients and 4 
processes, I achieve 4 GB/s. To reach 16 GB/s, I need to use 16 clients with 2 
processes per client. 
Stripe count 
NP 
Write (MB/s) 
Read (MB/s) 
1 
1 
1843.57 
1618.57 
1 
2 
2079.28 
1914.32 
2 
2 
2579.28 
2298.19 
2 
4 
1337.38 
1310.23 
16 
16 
1313.24 
1345.24 
16 
32 
1455.45 
1398.23 
32 
32 
1477.75 
1410.68 
800 
32 
1326.41 
1210.13 




1. Performance does not improve by adding more OSTs. With 2 OSTs and 2 clients, 
the performance remains at 4 GB/s, and with 16 clients, the performance is only 
equivalent to 1 OST. 
I am wondering why the performance does not scale after 2 processes per client. 
Could it be that overstriping alone is not sufficient to enhance performance 
for Single Shared File mode? Are there any additional settings I should 
consider configuring beyond overstriping? 


The results of obdfilter-survey and lnet do not show any bottleneck. 


I am using Lustre 2.15.4 with Rocky Linux 8.9 and kernel 
4.18.0-513.9.1.el8_lustre.x86_64.
- Information of MGS/MDS/OSS: 16 CPUs, 32 GB RAM. 
- Information of Clients: AMD EPYC 7662 64 core x2, 512 GB RAM.
The network connection is InfiniBand with 400 Gbps bandwidth. 
Other settings on the Lustre cluster: 
``` 
# Clients: 
options lnet networks="o2ib(ib0)" 
options ko2iblnd peer_credits=32 peer_credits_hiw=16 credits=256 
concurrent_sends=64 
lctl set_param osc.*.max_pages_per_rpc=4096 
lctl set_param osc.*.checksums=0 
lctl set_param osc.*.max_rpcs_in_flight=16 


# OSSs:
options lnet networks="o2ib(ib0)" 
options libcfs cpu_npartitions=1 
options ko2iblnd peer_credits=32 peer_credits_hiw=16 credits=256 
concurrent_sends=64 nscheds=8 
options ost oss_num_threads=128 
lctl set_param *.*.brw_size=16 
lctl set_param osd-ldiskfs.*.writethrough_cache_enable=0 
lctl set_param osd-ldiskfs.*.read_cache_enable=0 


# MGS – MDSs 
options lnet networks="o2ib(ib0)" 
options libcfs cpu_npartitions=1 
options ko2iblnd peer_credits=32 peer_credits_hiw=16 credits=256 
concurrent_sends=64 nscheds=8 
``` 
Thank you for your helping. 







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to