My testing cluster is an all hdd cluster with 12 osd(10T hdd each).
I moinitor luminous 12.2.2 write performance and osd memory usage with grafana
graph for statistic logging.
The test is done by using fio on a mounted rbd with follow fio parameters:
fio -directory=fiotest -direct=1 -thread -rw=write -ioengine=libaio -size=200G
-group_reporting -bs=1m -iodepth 4 -numjobs=200 -name=writetest
I found there is a noticeably performance degration over time.
Graph of write throughput and iops
https://pasteboard.co/GZflpTO.png
Graph of osd memory usage(2 of 12 osds,the pattern are identical)
https://pasteboard.co/GZfmfzo.png
Graph of osd perf
https://pasteboard.co/GZfmZNx.png
There are some interesting founding from the graph.
After 18:00 suddenly the write throughput dropped and the osd latency
increased. TCmalloc started relcaim page heap freelist much more frequently.All
of this happened very fast and every osd had the indentical pattern.
I have done this kind of test several times with different bluestore
cache setting and find out with more cache the performance degradation would
happen later.
I don't know if this is a bug or I can fix it with modify some of the
config of my cluster.
Any advice or direction to look into is appreciated.
Thanks
2017-12-21
lin.yunfan
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com