Hi Gabryel,

Are the pools always using 1X replication?  The rados results are scaling like it's using 1X but the CephFS results definitely look suspect.  Have you tried turning up the iodepth in addition to tuning numjobs?  Also is this kernel cephfs or fuse?  The fuse client is far slower.  FWIW, on our test cluster with NVMe drives I can get about 60-65GB/s for large sequential writes across 80 OSDs (using 100 client processes with kernel cephfs).  It's definitely possible to scale better than what you are seeing here.


https://docs.google.com/spreadsheets/d/1SpwEk3vB9gWzoxvy-K0Ax4NKbRJwd7W1ip-W-qitLlw/edit?usp=sharing


Mark


On 3/30/20 8:56 AM, Gabryel Mason-Williams wrote:
We have been benchmarking CephFS and comparing it Rados to see the performance 
difference and how much overhead CephFS has. However, we are getting odd 
results when using more than 1 OSD server (each OSDS has only one disk) using 
CephFS but using Rados everything appears normal. These tests are run on the 
same Ceph Cluster.

                 CephFS         Rados
OSDS    Thread 16       Thread 16
1               289                  316
2               139                      546
3               143                      728
4               142                      844

CephFS is being benchmarked using: fio --name=seqwrite --rw=write --direct=1 
--ioengine=libaio --bs=4M --numjobs=16  --size=1G  --group_reporting
Rados is being benchmarked using: rados bench -p cephfs_data 10 write -t 16

If you could provide some help or insight into why this is happening or how to 
stop it, that would be much appreciated.

Kind regards,

Gabryel
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to