Hello Jan On Wed, Sep 9, 2015 at 11:59 AM, Jan Schermer <j...@schermer.cz> wrote:
> Just to recapitulate - the nodes are doing "nothing" when it drops to > zero? Not flushing something to drives (iostat)? Not cleaning pagecache > (kswapd and similiar)? Not out of any type of memory (slab, > min_free_kbytes)? Not network link errors, no bad checksums (those are hard > to spot, though)? > > Unless you find something I suggest you try disabling offloads on the NICs > and see if the problem goes away. > Could you please elaborate this point , how do you disable / offload on the NIC ? what does it mean ? how to do it ? how its gonna help. Sorry i don't know about this. - Vickey - > > Jan > > > On 08 Sep 2015, at 18:26, Lincoln Bryant <linco...@uchicago.edu> wrote: > > > > For whatever it’s worth, my problem has returned and is very similar to > yours. Still trying to figure out what’s going on over here. > > > > Performance is nice for a few seconds, then goes to 0. This is a similar > setup to yours (12 OSDs per box, Scientific Linux 6, Ceph 0.94.3, etc) > > > > 384 16 29520 29504 307.287 1188 0.0492006 0.208259 > > 385 16 29813 29797 309.532 1172 0.0469708 0.206731 > > 386 16 30105 30089 311.756 1168 0.0375764 0.205189 > > 387 16 30401 30385 314.009 1184 0.036142 0.203791 > > 388 16 30695 30679 316.231 1176 0.0372316 0.202355 > > 389 16 30987 30971 318.42 1168 0.0660476 0.200962 > > 390 16 31282 31266 320.628 1180 0.0358611 0.199548 > > 391 16 31568 31552 322.734 1144 0.0405166 0.198132 > > 392 16 31857 31841 324.859 1156 0.0360826 0.196679 > > 393 16 32090 32074 326.404 932 0.0416869 0.19549 > > 394 16 32205 32189 326.743 460 0.0251877 0.194896 > > 395 16 32302 32286 326.897 388 0.0280574 0.194395 > > 396 16 32348 32332 326.537 184 0.0256821 0.194157 > > 397 16 32385 32369 326.087 148 0.0254342 0.193965 > > 398 16 32424 32408 325.659 156 0.0263006 0.193763 > > 399 16 32445 32429 325.054 84 0.0233839 0.193655 > > 2015-09-08 11:22:31.940164 min lat: 0.0165045 max lat: 67.6184 avg lat: > 0.193655 > > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > > 400 16 32445 32429 324.241 0 - 0.193655 > > 401 16 32445 32429 323.433 0 - 0.193655 > > 402 16 32445 32429 322.628 0 - 0.193655 > > 403 16 32445 32429 321.828 0 - 0.193655 > > 404 16 32445 32429 321.031 0 - 0.193655 > > 405 16 32445 32429 320.238 0 - 0.193655 > > 406 16 32445 32429 319.45 0 - 0.193655 > > 407 16 32445 32429 318.665 0 - 0.193655 > > > > needless to say, very strange. > > > > —Lincoln > > > > > >> On Sep 7, 2015, at 3:35 PM, Vickey Singh <vickey.singh22...@gmail.com> > wrote: > >> > >> Adding ceph-users. > >> > >> On Mon, Sep 7, 2015 at 11:31 PM, Vickey Singh < > vickey.singh22...@gmail.com> wrote: > >> > >> > >> On Mon, Sep 7, 2015 at 10:04 PM, Udo Lembke <ulem...@polarzone.de> > wrote: > >> Hi Vickey, > >> Thanks for your time in replying to my problem. > >> > >> I had the same rados bench output after changing the motherboard of the > monitor node with the lowest IP... > >> Due to the new mainboard, I assume the hw-clock was wrong during > startup. Ceph health show no errors, but all VMs aren't able to do IO (very > high load on the VMs - but no traffic). > >> I stopped the mon, but this don't changed anything. I had to restart > all other mons to get IO again. After that I started the first mon also > (with the right time now) and all worked fine again... > >> > >> Thanks i will try to restart all OSD / MONS and report back , if it > solves my problem > >> > >> Another posibility: > >> Do you use journal on SSDs? Perhaps the SSDs can't write to garbage > collection? > >> > >> No i don't have journals on SSD , they are on the same OSD disk. > >> > >> > >> > >> Udo > >> > >> > >> On 07.09.2015 16:36, Vickey Singh wrote: > >>> Dear Experts > >>> > >>> Can someone please help me , why my cluster is not able write data. > >>> > >>> See the below output cur MB/S is 0 and Avg MB/s is decreasing. > >>> > >>> > >>> Ceph Hammer 0.94.2 > >>> CentOS 6 (3.10.69-1) > >>> > >>> The Ceph status says OPS are blocked , i have tried checking , what > all i know > >>> > >>> - System resources ( CPU , net, disk , memory ) -- All normal > >>> - 10G network for public and cluster network -- no saturation > >>> - Add disks are physically healthy > >>> - No messages in /var/log/messages OR dmesg > >>> - Tried restarting OSD which are blocking operation , but no luck > >>> - Tried writing through RBD and Rados bench , both are giving same > problemm > >>> > >>> Please help me to fix this problem. > >>> > >>> # rados bench -p rbd 60 write > >>> Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds > or 0 objects > >>> Object prefix: benchmark_data_stor1_1791844 > >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg > lat > >>> 0 0 0 0 0 0 - > 0 > >>> 1 16 125 109 435.873 436 0.022076 > 0.0697864 > >>> 2 16 139 123 245.948 56 0.246578 > 0.0674407 > >>> 3 16 139 123 163.969 0 - > 0.0674407 > >>> 4 16 139 123 122.978 0 - > 0.0674407 > >>> 5 16 139 123 98.383 0 - > 0.0674407 > >>> 6 16 139 123 81.9865 0 - > 0.0674407 > >>> 7 16 139 123 70.2747 0 - > 0.0674407 > >>> 8 16 139 123 61.4903 0 - > 0.0674407 > >>> 9 16 139 123 54.6582 0 - > 0.0674407 > >>> 10 16 139 123 49.1924 0 - > 0.0674407 > >>> 11 16 139 123 44.7201 0 - > 0.0674407 > >>> 12 16 139 123 40.9934 0 - > 0.0674407 > >>> 13 16 139 123 37.8401 0 - > 0.0674407 > >>> 14 16 139 123 35.1373 0 - > 0.0674407 > >>> 15 16 139 123 32.7949 0 - > 0.0674407 > >>> 16 16 139 123 30.7451 0 - > 0.0674407 > >>> 17 16 139 123 28.9364 0 - > 0.0674407 > >>> 18 16 139 123 27.3289 0 - > 0.0674407 > >>> 19 16 139 123 25.8905 0 - > 0.0674407 > >>> 2015-09-07 15:54:52.694071min lat: 0.022076 max lat: 0.46117 avg lat: > 0.0674407 > >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg > lat > >>> 20 16 139 123 24.596 0 - > 0.0674407 > >>> 21 16 139 123 23.4247 0 - > 0.0674407 > >>> 22 16 139 123 22.36 0 - > 0.0674407 > >>> 23 16 139 123 21.3878 0 - > 0.0674407 > >>> 24 16 139 123 20.4966 0 - > 0.0674407 > >>> 25 16 139 123 19.6768 0 - > 0.0674407 > >>> 26 16 139 123 18.92 0 - > 0.0674407 > >>> 27 16 139 123 18.2192 0 - > 0.0674407 > >>> 28 16 139 123 17.5686 0 - > 0.0674407 > >>> 29 16 139 123 16.9628 0 - > 0.0674407 > >>> 30 16 139 123 16.3973 0 - > 0.0674407 > >>> 31 16 139 123 15.8684 0 - > 0.0674407 > >>> 32 16 139 123 15.3725 0 - > 0.0674407 > >>> 33 16 139 123 14.9067 0 - > 0.0674407 > >>> 34 16 139 123 14.4683 0 - > 0.0674407 > >>> 35 16 139 123 14.0549 0 - > 0.0674407 > >>> 36 16 139 123 13.6645 0 - > 0.0674407 > >>> 37 16 139 123 13.2952 0 - > 0.0674407 > >>> 38 16 139 123 12.9453 0 - > 0.0674407 > >>> 39 16 139 123 12.6134 0 - > 0.0674407 > >>> 2015-09-07 15:55:12.697124min lat: 0.022076 max lat: 0.46117 avg lat: > 0.0674407 > >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg > lat > >>> 40 16 139 123 12.2981 0 - > 0.0674407 > >>> 41 16 139 123 11.9981 0 - > 0.0674407 > >>> > >>> > >>> > >>> > >>> cluster 86edf8b8-b353-49f1-ab0a-a4827a9ea5e8 > >>> health HEALTH_WARN > >>> 1 requests are blocked > 32 sec > >>> monmap e3: 3 mons at {stor0111= > 10.100.1.111:6789/0,stor0113=10.100.1.113:6789/0,stor011 > >>> 5=10.100.1.115:6789/0} > >>> election epoch 32, quorum 0,1,2 stor0111,stor0113,stor0115 > >>> osdmap e19536: 50 osds: 50 up, 50 in > >>> pgmap v928610: 2752 pgs, 9 pools, 30476 GB data, 4183 kobjects > >>> 91513 GB used, 47642 GB / 135 TB avail > >>> 2752 active+clean > >>> > >>> > >>> Tried using RBD > >>> > >>> > >>> # dd if=/dev/zero of=file1 bs=4K count=10000 oflag=direct > >>> 10000+0 records in > >>> 10000+0 records out > >>> 40960000 bytes (41 MB) copied, 24.5529 s, 1.7 MB/s > >>> > >>> # dd if=/dev/zero of=file1 bs=1M count=100 oflag=direct > >>> 100+0 records in > >>> 100+0 records out > >>> 104857600 bytes (105 MB) copied, 1.05602 s, 9.3 MB/s > >>> > >>> # dd if=/dev/zero of=file1 bs=1G count=1 oflag=direct > >>> 1+0 records in > >>> 1+0 records out > >>> 1073741824 bytes (1.1 GB) copied, 293.551 s, 3.7 MB/s > >>> ]# > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list > >>> > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com