Disk controller seem fine

Any other suggestions will be really appreciated

megacli -AdpBbuCmd -aAll

BBU status for Adapter: 0

BatteryType: BBU
Voltage: 3925 mV
Current: 0 mA
Temperature: 17 C
Battery State: Optimal
BBU Firmware Status:

  Charging Status              : None
  Voltage                                 : OK
  Temperature                             : OK
  Learn Cycle Requested                   : No
  Learn Cycle Active                      : No
  Learn Cycle Status                      : OK
  Learn Cycle Timeout                     : No
  I2c Errors Detected                     : No
  Battery Pack Missing                    : No
  Battery Replacement required            : No
  Remaining Capacity Low                  : No
  Periodic Learn Required                 : No
  Transparent Learn                       : No
  No space to cache offload               : No
  Pack is about to fail & should be replaced : No
  Cache Offload premium feature required  : No
  Module microcode update required        : No


 megacli -AdpDiag -a0

Performing Diagnostic on Controller 0.
 It will take 20 seconds to complete. Please wait...
Diagnostic Completed on Controller 0.

Exit Code: 0x00



On Fri, 6 Apr 2018 at 15:11, David Turner <drakonst...@gmail.com> wrote:

> First and foremost, have you checked your disk controller.  Of most import
> would be your cache battery.  Any time I have a single node acting up, the
> controller is Suspect #1.
>
> On Thu, Apr 5, 2018 at 11:23 AM Steven Vacaroaia <ste...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a strange issue - OSDs from a specific server are introducing huge
>> performance issue
>>
>> This is a brand new installation on 3 identical servers -
>>  DELL R620 with PERC H710 , bluestore  DB and WAL on SSD, 10GB dedicated
>> private/public networks
>>
>>
>> When I add the OSD I see gaps like below and huge latency
>>
>> atop provides no  clear culprit EXCEPT very low network and specific disk
>> utilization BUT 100% DSK for ceph-osd process  which stay like that ( 100%)
>> for the duration of the test
>> ( see below)
>>
>> Not sure why ceph-osd process  DSK stays at 100% while all the specific
>> DSK ( for sdb, sde ..etc) are 1% busy ?
>>
>> Any help/ instructions for how to troubleshooting this will be
>> appreciated
>>
>> (apologies if the format is not being kept)
>>
>>
>> CPU | sys       4%  | user      1%  |               | irq       1%  |
>>            | idle    794%  | wait      0%  |              |
>>  |  steal     0% |  guest     0% |  curf 2.20GHz |               |
>> curscal   ?% |
>> CPL | avg1    0.00  |               | avg5    0.00  | avg15   0.00  |
>>            |               |               | csw    547/s |
>>  |  intr   832/s |               |               |  numcpu     8 |
>>      |
>> MEM | tot    62.9G  | free   61.4G  | cache 520.6M  | dirty   0.0M  |
>> buff    7.5M  | slab   98.9M  | slrec  64.8M  | shmem   8.8M |  shrss
>>  0.0M |  shswp   0.0M |  vmbal   0.0M |               |  hptot   0.0M |
>> hpuse   0.0M |
>> SWP | tot     6.0G  | free    6.0G  |               |               |
>>            |               |               |              |
>>  |               |               |  vmcom   1.5G |               |  vmlim
>> 37.4G |
>> LVM |         dm-0  | busy      1%  |               | read     0/s  |
>> write   54/s  |               | KiB/r      0  | KiB/w    455 |  MBr/s
>> 0.0 |               |  MBw/s   24.0 |  avq     3.69 |               |  avio
>> 0.14 ms |
>> DSK |          sdb  | busy      1%  |               | read     0/s  |
>> write  102/s  |               | KiB/r      0  | KiB/w    240 |  MBr/s
>> 0.0 |               |  MBw/s   24.0 |  avq     6.69 |               |  avio
>> 0.08 ms |
>> DSK |          sda  | busy      0%  |               | read     0/s  |
>> write   12/s  |               | KiB/r      0  | KiB/w      4 |  MBr/s
>> 0.0 |               |  MBw/s    0.1 |  avq     1.00 |               |  avio
>> 0.05 ms |
>> DSK |          sde  | busy      0%  |               | read     0/s  |
>> write    0/s  |               | KiB/r      0  | KiB/w      0 |  MBr/s
>> 0.0 |               |  MBw/s    0.0 |  avq     1.00 |               |  avio
>> 2.50 ms |
>> NET | transport     | tcpi   718/s  | tcpo   972/s  | udpi     0/s  |
>>            | udpo     0/s  | tcpao    0/s  | tcppo    0/s |  tcprs   21/s
>> |  tcpie    0/s |  tcpor    0/s |               |  udpnp    0/s |  udpie
>> 0/s |
>> NET | network       | ipi    719/s  |               | ipo    399/s  |
>> ipfrw    0/s  |               | deliv  719/s  |              |
>>  |               |               |  icmpi    0/s |               |  icmpo
>>   0/s |
>> NET | eth5      1%  | pcki  2214/s  | pcko   939/s  |               | sp
>>  10 Gbps  | si  154 Mbps  | so   52 Mbps  |              |  coll     0/s |
>> mlti     0/s |  erri     0/s |  erro     0/s |  drpi     0/s |  drpo
>>  0/s |
>> NET | eth4      0%  | pcki   712/s  | pcko    54/s  |               | sp
>>  10 Gbps  | si   50 Mbps  | so   90 Kbps  |              |  coll     0/s |
>> mlti     0/s |  erri     0/s |  erro     0/s |  drpi     0/s |  drpo
>>  0/s |
>>
>>     PID                                 TID
>>  RDDSK                               WRDSK
>>  WCANCL                               DSK                              CMD
>>      1/21
>>    2067                                   -
>>   0K/s                              0.0G/s
>>  0K/s                              100%
>> ceph-osd
>>
>>
>>
>>
>>
>> 2018-04-05 10:55:24.316549 min lat: 0.0203278 max lat: 10.7501 avg lat:
>> 0.496822
>>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
>> lat(s)
>>    40      16      1096      1080   107.988         0           -
>> 0.496822
>>    41      16      1096      1080   105.354         0           -
>> 0.496822
>>    42      16      1096      1080   102.846         0           -
>> 0.496822
>>    43      16      1096      1080   100.454         0           -
>> 0.496822
>>    44      16      1205      1189   108.079   48.4444   0.0430396
>> 0.588127
>>    45      16      1234      1218   108.255       116   0.0318717
>> 0.575485
>>    46      16      1234      1218   105.901         0           -
>> 0.575485
>>    47      16      1234      1218   103.648         0           -
>> 0.575485
>>    48      16      1234      1218   101.489         0           -
>> 0.575485
>>    49      16      1261      1245   101.622        27    0.157469
>> 0.604268
>>    50      16      1335      1319   105.508       296    0.191907
>> 0.604862
>>    51      16      1418      1402   109.949       332   0.0367004
>> 0.573429
>>    52      16      1437      1421   109.296        76    0.031818
>> 0.566289
>>    53      16      1481      1465   110.554       176   0.0405567
>> 0.564885
>>    54      16      1516      1500   111.099       140   0.0272873
>> 0.552698
>>    55      16      1516      1500   109.079         0           -
>> 0.552698
>>    56      16      1516      1500   107.131         0           -
>> 0.552698
>>    57      16      1516      1500   105.252         0           -
>> 0.552698
>>    58      16      1555      1539   106.127        39     0.15675
>> 0.601747
>>
>> Total time run:       58.971664
>> Total reads made:     1565
>> Read size:            4194304
>> Object size:          4194304
>> Bandwidth (MB/sec):   106.153
>> Average IOPS:         26
>> Stddev IOPS:          33
>> Max IOPS:             121
>> Min IOPS:             0
>> Average Latency(s):   0.600788
>> Max latency(s):       10.7501
>> Min latency(s):       0.019135
>>
>>
>> megacli -LDGetProp -cache -Lall -a0
>>
>> Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone,
>> Direct, Write Cache OK if bad BBU
>> Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive,
>> Cached, No Write Cache if bad BBU
>> Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive,
>> Cached, No Write Cache if bad BBU
>> Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive,
>> Cached, No Write Cache if bad BBU
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to