[ceph-users] [luminous 12.2.2] Cluster write performance degradation problem(possibly tcmalloc related)

shadow_lin Thu, 21 Dec 2017 06:53:37 -0800

My testing cluster is an all hdd cluster with 12 osd(10T hdd each).
I moinitor luminous 12.2.2 write performance and osd memory usage with grafana 
graph for statistic logging.
The test is done  by using fio on a mounted rbd with follow fio parameters:
fio -directory=fiotest -direct=1 -thread -rw=write -ioengine=libaio  -size=200G 
-group_reporting -bs=1m -iodepth 4 -numjobs=200 -name=writetest
       I found there is a noticeably performance degration over time.
       Graph of write throughput and iops
       https://pasteboard.co/GZflpTO.png
       Graph of osd memory usage(2 of 12 osds,the pattern are identical)
       https://pasteboard.co/GZfmfzo.png
       Graph of osd perf
       https://pasteboard.co/GZfmZNx.png


       There are some interesting founding from the graph.
       After 18:00 suddenly the write throughput dropped and the osd latency 
increased. TCmalloc started relcaim page heap freelist much more frequently.All 
of this happened very fast and every osd had the indentical pattern.

       I have done this kind of test several times with different bluestore 
cache setting and find out with more cache the performance degradation would 
happen later.

     I don't know if this is a bug or I can fix it with modify some of the 
config of my cluster.     
      Any advice or direction to look into is appreciated.

      Thanks
       
     


2017-12-21



lin.yunfan

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] [luminous 12.2.2] Cluster write performance degradation problem(possibly tcmalloc related)

Reply via email to