Hi Ceph-users,

I am having some trouble in finding the bottleneck in my CephFS Infernalis 
setup.

I am running 5 OSD servers which all have 6 OSD's each (so I have 30 OSD's in 
total). Each OSD is a physical disk (non SSD) and each OSD has it's journal 
stored on the first partition of it's own disk. I have 3 mon servers and 2 MDS 
servers which are setup in active / passive mode. All servers have a redundant 
10G NIC's configuration.

I am monitoring all resources of each server (cpu/memory/network/disk usage) 
and I would expect that my first bottleneck would be the OSD disk speed but 
looking at my graphs, that is not the case. I have plenty of CPU / Memory / 
Network / Disk speed left but I still am not able to get a better performance. 
The ceph cluster states that it is healthy. I have all setting default except 
for the osd_op_threads. I have altered that to 20 instead of the default 2.

When looking to the processes on my OSD servers, you can see the expected 
processes running:

[root@XXXX ~]# ps ajxf | grep ceph-osd
 2497 25505 25504  2476 pts/0    25504 S+       0   0:00                      
\_ grep --color=auto ceph-osd
    1 10051 10051 10051 ?           -1 Ssl    167 15584:14 /usr/bin/ceph-osd -f 
--cluster ceph --id 3 --setuser ceph --setgroup ceph
    1 11587 11587 11587 ?           -1 Ssl    167 14991:09 /usr/bin/ceph-osd -f 
--cluster ceph --id 4 --setuser ceph --setgroup ceph
    1 12551 12551 12551 ?           -1 Ssl    167 14687:16 /usr/bin/ceph-osd -f 
--cluster ceph --id 5 --setuser ceph --setgroup ceph
    1 18895 18895 18895 ?           -1 Ssl    167 3052:43 /usr/bin/ceph-osd -f 
--cluster ceph --id 22 --setuser ceph --setgroup ceph
    1 20788 20788 20788 ?           -1 Ssl    167 3314:31 /usr/bin/ceph-osd -f 
--cluster ceph --id 23 --setuser ceph --setgroup ceph
    1 27220 27220 27220 ?           -1 Ssl    167 2240:37 /usr/bin/ceph-osd -f 
--cluster ceph --id 26 --setuser ceph --setgroup ceph


When looking at the amount of threads that are being used for ceph osd id 5 for 
instance, you can see this:

[root@XXXX ~]# ps huH p 12551 | wc -l
349

I would expect that this number is variable depending on the load on the 
cluster. When increasing osd_op_threads to 25, I am seeing 354 threads for that 
osd id. So the increase is correct but what are all the other threads? Is there 
any easy way for mee to see if the configured max op threads is currently being 
reached? Or is there any other bottleneck that I am overlooking?

Any clear view on this would be appreciated.

Kind regards,

Davie De Smet


Davie De Smet
Director Technical Operations and Customer Services, Nomadesk
davie.des...@nomadesk.com<mailto:davie.des...@nomadesk.com%0d>
+32 9 240 10 31 (Office)

Join Nomadesk:  Facebook<http://www.facebook.com/Nomadesk> | 
Twitter<http://twitter.com/#!/nomadesk>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to