Hi, You are measuring the read-cache of your client, your 56Gb/s IB connection can only reach up to 7GB/s, your 88GB/s is bogus.
https://ior.readthedocs.io/en/latest/userDoc/tutorial.html#effect-of-page-cache-on-benchmarking Drop your cache before the read phase or use multiple clients so your client doesn't re-read what they have just written. On Wed, Apr 16, 2025 at 2:41 AM evancervj via lustre-discuss < [email protected]> wrote: > Hi, > > I have been working on benchmarking Lustre with IOR on a 4-node cluster > and have encountered an issue, where the observed write bandwidth is > significantly lower than read bandwidth. Below are the setup details for > the cluster: > > 1. 1 MGS/MDS node with : > 1. Linux Kernel 4.18.0-513.9.1.el8_lustre.x86_64 > 2. 800 GB nvme disk formatted as LDISKFS > 3. Lustre server v2.15.4 > 2. 2 OSS nodes with 1 OST on each node with : > 1. Linux Kernel 4.18.0-513.9.1.el8_lustre.x86_64 > 2. 800 GB nvme disk formatted as LDISKFS > 3. Lustre server v2.15.4 > 3. 1 lustre client with : > 1. Lustre v2.15.6 > 2. Linux Kernel 5.14.0-503.11.1.el9_5.x86_64 > 4. default strip size is used : > 1. stripe_count: 1 stripe_size: 1048576 pattern: 0 > stripe_offset: -1 > 5. Interconnected using 56 Gbps Mellanox IB network > 6. Contents of /etc/modprobe.d/lustre.conf file : > > options lnet networks="o2ib(ib0)" > > options lnet lnet_transaction_timeout=100 > > options lnet lnet_retry_count=2 > > options ko2iblnd peer_credits=32 > > options ko2iblnd peer_credits_hiw=16 > > options ko2iblnd concurrent_sends=256 > > options ksocklnd conns_per_peer=0 > > options ost oss_num_threads=64 > > > > I conducted individual tests on the OST nodes using obdfilter-survey. For > reference, the full summary output of the test is attached. > > - nobjlo=1 nobjhi=512 thrlo=1 thrhi=1024 size=480000 > rslt_loc=/var/tmp/obdfilter-survey_out targets="lustrefs-OST0001" case=disk > obdfilter-survey > > ost 1 sz 491520000K rsz 1024K obj 16 thr 16 write 3377.62 [1428.92, > 186681.32] rewrite 154516.51 [147831.54, 186675.48] read 6977.51 [3370.55, > 103311.36] > > ost 1 sz 491520000K rsz 1024K obj 16 thr 32 write 3661.83 [1510.83, > 192783.49] rewrite 150708.13 [186337.79, 186337.79] read 6951.00 [2917.89, > 59171.64] > > ost 1 sz 491520000K rsz 1024K obj 16 thr 64 write 3603.10 [1545.90, > 213008.56] rewrite 172656.48 [177891.67, 177891.67] read 6984.14 [3352.78, > 57702.04] > > ost 1 sz 491520000K rsz 1024K obj 16 thr 128 write 3692.16 [1594.80, > 13478.11] rewrite 149716.18 [106440.28, 225295.61] read 6850.52 [2804.80, > 45156.82] > > ost 1 sz 491520000K rsz 1024K obj 16 thr 256 write 3661.13 [1446.88, > 223403.23] rewrite 140771.55 [103769.40, 190108.76] read 6964.33 [3357.70, > 85623.55] > > ost 1 sz 491257856K rsz 1024K obj 16 thr 512 write 3193.67 [1001.90, > 205874.24] rewrite 137435.34 [104790.09, 180991.34] read 6938.31 [3358.61, > 54319.14] > > ost 1 sz 490733568K rsz 1024K obj 16 thr 1024 write 2379.98 [ 454.94, > 202684.59] rewrite 130579.85 [100158.02, 161904.29] read 6945.24 [3354.17, > 48807.91] > > > > - nobjlo=1 nobjhi=512 thrlo=1 thrhi=1024 size=480000 > rslt_loc=/var/tmp/obdfilter-survey_out targets="lustrefs-OST0000" case=disk > obdfilter-survey > > ost 1 sz 491520000K rsz 1024K obj 16 thr 16 write 3747.17 [1393.84, > 190306.68] rewrite 156040.83 [148205.37, 188453.46] read 7009.94 [3398.61, > 108528.27] > > ost 1 sz 491520000K rsz 1024K obj 16 thr 32 write 3745.34 [1393.92, > 193273.05] rewrite 154722.13 [177941.31, 177941.31] read 6989.40 [3330.82, > 30959.14] > > ost 1 sz 491520000K rsz 1024K obj 16 thr 64 write 3760.65 [1367.83, > 104560.10] rewrite 162225.64 [148197.30, 148197.30] read 6999.92 [3363.80, > 60847.55] > > ost 1 sz 491520000K rsz 1024K obj 16 thr 128 write 3754.86 [1379.88, > 56060.15] rewrite 147814.31 [104369.56, 217353.01] read 6990.76 [3330.77, > 53634.79] > > ost 1 sz 491520000K rsz 1024K obj 16 thr 256 write 3705.70 [1358.82, > 150706.49] rewrite 138369.68 [101585.51, 182624.29] read 6962.05 [3337.70, > 73858.34] > > ost 1 sz 491257856K rsz 1024K obj 16 thr 512 write 3612.06 [1275.87, > 95958.05] rewrite 134727.11 [105177.20, 172269.61] read 6986.99 [3350.63, > 46219.95] > > ost 1 sz 490733568K rsz 1024K obj 16 thr 1024 write 2867.46 [ 537.87, > 53084.22] rewrite 129812.07 [102830.81, 159936.73] read 6987.93 [3335.35, > 79355.00] > > > > Network performance was evaluated across the cluster nodes using > lnet_selftest, yielding a bandwidth of approximately 6800 MB/s for both > read and write operations > > I used IOR-4.0.0 to check the read and write bandwidth of the setup using > the following command. The output is attached for reference. > > - mpirun -genvall -np 16 -ppn 16 -f /path_to_hostfile/hosts_rt05 > /path_to_ior_bin/ior -F -w -r -e -g -C -w -b 1g -t 1m -i 4 -D 70 -vv -o > ./out > > Max Write: 1712.75 MiB/sec (1795.95 MB/sec) > > Max Read: 83994.25 MiB/sec (88074.36 MB/sec) > > - mpirun -genvall -np 16 -ppn 16 -f /path_to_hostfile/hosts_rt05 > /path_to_ior_bin/ior -F -w -r -e -g -C -w -b 256m -t 1m -i 4 -D 70 -vv -o > ./out > > Max Write: 1633.31 MiB/sec (1712.65 MB/sec) > > Max Read: 73826.50 MiB/sec (77412.69 MB/sec) > > > > The observed write bandwidth of 1800 MB/s is significantly lower than the > read bandwidth of 88,000 MB/s. Are there specific configurations that could > help enhance write performance? Any suggestions or insights on addressing > this disparity would be greatly appreciated. > > Thanks > > John > > ------------------------------------------------------------------------------------------------------------ > > [ C-DAC is on Social-Media too. Kindly follow us at: > Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] > > This e-mail is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. If you are not the > intended recipient, please contact the sender by reply e-mail and destroy > all copies and the original message. Any unauthorized review, use, > disclosure, dissemination, forwarding, printing or copying of this email > is strictly prohibited and appropriate legal action will be taken. > ------------------------------------------------------------------------------------------------------------ > > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
