Hi Gabryel,
Are the pools always using 1X replication? The rados results are scaling like it's using 1X but the CephFS results definitely look suspect. Have you tried turning up the iodepth in addition to tuning numjobs? Also is this kernel cephfs or fuse? The fuse client is far slower. FWIW, on our test cluster with NVMe drives I can get about 60-65GB/s for large sequential writes across 80 OSDs (using 100 client processes with kernel cephfs). It's definitely possible to scale better than what you are seeing here.
https://docs.google.com/spreadsheets/d/1SpwEk3vB9gWzoxvy-K0Ax4NKbRJwd7W1ip-W-qitLlw/edit?usp=sharing Mark On 3/30/20 8:56 AM, Gabryel Mason-Williams wrote:
We have been benchmarking CephFS and comparing it Rados to see the performance difference and how much overhead CephFS has. However, we are getting odd results when using more than 1 OSD server (each OSDS has only one disk) using CephFS but using Rados everything appears normal. These tests are run on the same Ceph Cluster. CephFS Rados OSDS Thread 16 Thread 16 1 289 316 2 139 546 3 143 728 4 142 844 CephFS is being benchmarked using: fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=4M --numjobs=16 --size=1G --group_reporting Rados is being benchmarked using: rados bench -p cephfs_data 10 write -t 16 If you could provide some help or insight into why this is happening or how to stop it, that would be much appreciated. Kind regards, Gabryel _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io