Have you run tests on the IB network to ensure that it is performing at the speed you expect? If the network seems ok, it might be worth doing IOR tests with file per process to see what performance you can get. In general, FPP tends to perform better than SSF. I know that you have set your stripe size to match the block size in order to reduce contention, but FPP tests might still be useful.
--Rick On 7/31/24, 10:53 PM, "lustre-discuss on behalf of Cong Le Duy" <lustre-discuss-boun...@lists.lustre.org <mailto:lustre-discuss-boun...@lists.lustre.org> on behalf of con...@innotech-vn.com <mailto:con...@innotech-vn.com>> wrote: Hi all, I am testing a Lustre system that includes 1 MGS, 2 MDS, and 8 OSS with 8 OSTs running RAID 6 (8d+2p). Each OST's performance is approximately 16 GB/s for WRITE and 33 GB/s for READ (measured with FIO test: blocksize=1m, iodepth=64, numjob=2 sequential test). The system has 16 clients. I am encountering issues with performance testing using IOR with the following options: ``` mpirun --allow-run-as-root --mca pml ucx -x UCX_TLS=rc_mlx5,ud_mlx5,self -x UCX_NET_DEVICES=mlx5_0:1 --mca btl ^openib --hostfile mphost10 -np <number_of_process> -map-by node ior -w -r -b 2m -t 2m -C -s 4000 -k -e -o /lustre/testFS/ior/iortest ``` The stripe_count is set equal to the number of processes (overstriping), and the stripe_size is equal to the block size (2m). The issues I am facing are: 1. Performance does not increase beyond 2 processes per client. With 1 client and 1 OST, I achieve approximately 2 GB/s for WRITE. With 2 clients and 4 processes, I achieve 4 GB/s. To reach 16 GB/s, I need to use 16 clients with 2 processes per client. Stripe count NP Write (MB/s) Read (MB/s) 1 1 1843.57 1618.57 1 2 2079.28 1914.32 2 2 2579.28 2298.19 2 4 1337.38 1310.23 16 16 1313.24 1345.24 16 32 1455.45 1398.23 32 32 1477.75 1410.68 800 32 1326.41 1210.13 1. Performance does not improve by adding more OSTs. With 2 OSTs and 2 clients, the performance remains at 4 GB/s, and with 16 clients, the performance is only equivalent to 1 OST. I am wondering why the performance does not scale after 2 processes per client. Could it be that overstriping alone is not sufficient to enhance performance for Single Shared File mode? Are there any additional settings I should consider configuring beyond overstriping? The results of obdfilter-survey and lnet do not show any bottleneck. I am using Lustre 2.15.4 with Rocky Linux 8.9 and kernel 4.18.0-513.9.1.el8_lustre.x86_64. - Information of MGS/MDS/OSS: 16 CPUs, 32 GB RAM. - Information of Clients: AMD EPYC 7662 64 core x2, 512 GB RAM. The network connection is InfiniBand with 400 Gbps bandwidth. Other settings on the Lustre cluster: ``` # Clients: options lnet networks="o2ib(ib0)" options ko2iblnd peer_credits=32 peer_credits_hiw=16 credits=256 concurrent_sends=64 lctl set_param osc.*.max_pages_per_rpc=4096 lctl set_param osc.*.checksums=0 lctl set_param osc.*.max_rpcs_in_flight=16 # OSSs: options lnet networks="o2ib(ib0)" options libcfs cpu_npartitions=1 options ko2iblnd peer_credits=32 peer_credits_hiw=16 credits=256 concurrent_sends=64 nscheds=8 options ost oss_num_threads=128 lctl set_param *.*.brw_size=16 lctl set_param osd-ldiskfs.*.writethrough_cache_enable=0 lctl set_param osd-ldiskfs.*.read_cache_enable=0 # MGS – MDSs options lnet networks="o2ib(ib0)" options libcfs cpu_npartitions=1 options ko2iblnd peer_credits=32 peer_credits_hiw=16 credits=256 concurrent_sends=64 nscheds=8 ``` Thank you for your helping. _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org