Hi, I have a strange issue - OSDs from a specific server are introducing huge performance issue
This is a brand new installation on 3 identical servers - DELL R620 with PERC H710 , bluestore DB and WAL on SSD, 10GB dedicated private/public networks When I add the OSD I see gaps like below and huge latency atop provides no clear culprit EXCEPT very low network and specific disk utilization BUT 100% DSK for ceph-osd process which stay like that ( 100%) for the duration of the test ( see below) Not sure why ceph-osd process DSK stays at 100% while all the specific DSK ( for sdb, sde ..etc) are 1% busy ? Any help/ instructions for how to troubleshooting this will be appreciated (apologies if the format is not being kept) CPU | sys 4% | user 1% | | irq 1% | | idle 794% | wait 0% | | | steal 0% | guest 0% | curf 2.20GHz | | curscal ?% | CPL | avg1 0.00 | | avg5 0.00 | avg15 0.00 | | | | csw 547/s | | intr 832/s | | | numcpu 8 | | MEM | tot 62.9G | free 61.4G | cache 520.6M | dirty 0.0M | buff 7.5M | slab 98.9M | slrec 64.8M | shmem 8.8M | shrss 0.0M | shswp 0.0M | vmbal 0.0M | | hptot 0.0M | hpuse 0.0M | SWP | tot 6.0G | free 6.0G | | | | | | | | | | vmcom 1.5G | | vmlim 37.4G | LVM | dm-0 | busy 1% | | read 0/s | write 54/s | | KiB/r 0 | KiB/w 455 | MBr/s 0.0 | | MBw/s 24.0 | avq 3.69 | | avio 0.14 ms | DSK | sdb | busy 1% | | read 0/s | write 102/s | | KiB/r 0 | KiB/w 240 | MBr/s 0.0 | | MBw/s 24.0 | avq 6.69 | | avio 0.08 ms | DSK | sda | busy 0% | | read 0/s | write 12/s | | KiB/r 0 | KiB/w 4 | MBr/s 0.0 | | MBw/s 0.1 | avq 1.00 | | avio 0.05 ms | DSK | sde | busy 0% | | read 0/s | write 0/s | | KiB/r 0 | KiB/w 0 | MBr/s 0.0 | | MBw/s 0.0 | avq 1.00 | | avio 2.50 ms | NET | transport | tcpi 718/s | tcpo 972/s | udpi 0/s | | udpo 0/s | tcpao 0/s | tcppo 0/s | tcprs 21/s | tcpie 0/s | tcpor 0/s | | udpnp 0/s | udpie 0/s | NET | network | ipi 719/s | | ipo 399/s | ipfrw 0/s | | deliv 719/s | | | | | icmpi 0/s | | icmpo 0/s | NET | eth5 1% | pcki 2214/s | pcko 939/s | | sp 10 Gbps | si 154 Mbps | so 52 Mbps | | coll 0/s | mlti 0/s | erri 0/s | erro 0/s | drpi 0/s | drpo 0/s | NET | eth4 0% | pcki 712/s | pcko 54/s | | sp 10 Gbps | si 50 Mbps | so 90 Kbps | | coll 0/s | mlti 0/s | erri 0/s | erro 0/s | drpi 0/s | drpo 0/s | PID TID RDDSK WRDSK WCANCL DSK CMD 1/21 2067 - 0K/s 0.0G/s 0K/s 100% ceph-osd 2018-04-05 10:55:24.316549 min lat: 0.0203278 max lat: 10.7501 avg lat: 0.496822 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 40 16 1096 1080 107.988 0 - 0.496822 41 16 1096 1080 105.354 0 - 0.496822 42 16 1096 1080 102.846 0 - 0.496822 43 16 1096 1080 100.454 0 - 0.496822 44 16 1205 1189 108.079 48.4444 0.0430396 0.588127 45 16 1234 1218 108.255 116 0.0318717 0.575485 46 16 1234 1218 105.901 0 - 0.575485 47 16 1234 1218 103.648 0 - 0.575485 48 16 1234 1218 101.489 0 - 0.575485 49 16 1261 1245 101.622 27 0.157469 0.604268 50 16 1335 1319 105.508 296 0.191907 0.604862 51 16 1418 1402 109.949 332 0.0367004 0.573429 52 16 1437 1421 109.296 76 0.031818 0.566289 53 16 1481 1465 110.554 176 0.0405567 0.564885 54 16 1516 1500 111.099 140 0.0272873 0.552698 55 16 1516 1500 109.079 0 - 0.552698 56 16 1516 1500 107.131 0 - 0.552698 57 16 1516 1500 105.252 0 - 0.552698 58 16 1555 1539 106.127 39 0.15675 0.601747 Total time run: 58.971664 Total reads made: 1565 Read size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 106.153 Average IOPS: 26 Stddev IOPS: 33 Max IOPS: 121 Min IOPS: 0 Average Latency(s): 0.600788 Max latency(s): 10.7501 Min latency(s): 0.019135 megacli -LDGetProp -cache -Lall -a0 Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone, Direct, Write Cache OK if bad BBU Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com