Hi, all

 

I am setting up Lustre on local cluster based on infiniband HDR(200Gb, single port) network.

I could successfuly setup Lustre using 5 servers(1-MDS, 4-OSS)

 

Even though I have verified the IB HDR bandwidth(200Gb/s) with 'ib_read_bw' or 'ib_write_bw' tools, (I used CPU#0 for the test)

when I run LNet-Selftest between servers, it only shows around 100Gb/s(around 12GB/s, just half of maximum bandwidth)

(~12GB/s in case of read, ~13GB/s for write test)

 

so I tried to change LNet tunables and fixed the CPT: "[0]" for IB with the following kernel module options.

but it doesn't show big difference in lnet-self test.

 

It seems like LNet is not fully compatible with HDR or PCIe gen4 interfaces.

Is there anyone who can give me advice why the LNet performance is not reaching HDR BW?

or, Are there specific options or tunables that I have to modify?

Please share your experience if you have setup Lustre with HDR network.

 

Thank you.

 

-----------lustre.conf---------------

options lnet networks=o2ib0(ib0)[0]

 

-----------ko2iblnd.conf-------------

options ko2iblnd peer_credits=256 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=0 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=1

 

----------lnet tunables--------------

tunables:
              peer_timeout: 180
              peer_credits: 255
              peer_buffer_credits: 0
              credits: 1024
              peercredits_hiw: 127
              map_on_demand: 0
              concurrent_sends: 256
              fmr_pool_size: 2048
              fmr_flush_trigger: 512
              fmr_cache: 1
              ntx: 2048
              conns_per_peer: 1

 

I listed up some HW/SW environment that I used.

 

------------------------------------------------------------

[Environment]

- CPUs: Epyc 7302 *2 socket, supports PCIe Gen4

- OS: CentOS 8.3 (kernel: 4.18.0-240.1.1.el8_lustre.x86_64)

- Lustre: 2.14.0 (downloaded from repository https://downloads.whamcloud.com/public/lustre/lustre-2.14.0-ib/ )

- OFED driver: tried 2 different versions MLNX_OFED_LINUX-5.2-1.0.4.0, MLNX_OFED_LINUX-5.4-1.0.3.0

 

Finally, I used the following LNet selftest script for test.

I tried to change concurrency, but the bandwidth is saturated when CN>=4

 

 

----------------------------------------------------------

# Concurrency
CN=32
#Size
SZ=1M
# Length of time to run test (secs)
TM=20
# Which BRW test to run (read or write)
BRW=read
# Checksum calculation (simple or full)
CKSUM=simple

# The LST "from" list -- e.g. Lustre clients. Space separated list of NIDs.
LFROM="192.168.8.4@o2ib0"
#LFROM=${LFROM:?ERROR: the LFROM variable is not set}
# The LST "to" list -- e.g. Lustre servers. Space separated list of NIDs.
LTO="192.168.8.6@o2ib0"
#LTO=${LTO:?ERROR: the LTO variable is not set}

### End of customisation.

export LST_SESSION=$$
echo LST_SESSION = ${LST_SESSION}
lst new_session lst${BRW}
lst add_group lfrom ${LFROM}
lst add_group lto ${LTO}
lst add_batch bulk_${BRW}
lst add_test --batch bulk_${BRW} --distribute 3:1 --from lfrom --to lto brw ${BRW} \
  --concurrency=${CN} check=${CKSUM} size=${SZ}
lst run bulk_${BRW}
echo -n "Capturing statistics for ${TM} secs "
lst stat --mbs lfrom lto &
LSTPID=$!
# Delay loop with interval markers displayed every 5 secs.
# Test time is rounded up to the nearest 5 seconds.
i=1
j=$((${TM}/5))
if [ $((${TM}%5)) -ne 0 ]; then let j++; fi
while [ $i -le $j ]; do
  sleep 5
  let i++
done
kill ${LSTPID} && wait ${LISTPID} >/dev/null 2>&1
echo
lst show_error lfrom lto
lst stop bulk_${BRW}
lst end_session

 

 

 

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to