I  was able to collect dump data during slow request, but this time I saw
that it was related to high load average and iowait so I keep watching.
And it was on particular two osds, but yesterday on other osds.
I see in dump of these two osds that operations are stuck on queued_for_pg,
for example:

            "description": "osd_op(client.13057605.0:51528 17.15
17:a93a5511:::notify.2:head [watch ping cookie 94259433737472] snapc
0=[] ondisk+write+known_if_redirected e10936)",
            "initiated_at": "2017-10-20 12:34:29.134946",
            "age": 484.314936,
            "duration": 55.421058,
            "type_data": {
                "flag_point": "started",
                "client_info": {
                    "client": "client.13057605",
                    "client_addr": "10.192.1.78:0/3748652520",
                    "tid": 51528
                },
                "events": [
                    {
                        "time": "2017-10-20 12:34:29.134946",
                        "event": "initiated"
                    },
                    {
                        "time": "2017-10-20 12:34:29.135075",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2017-10-20 12:35:24.555957",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2017-10-20 12:35:24.555978",
                        "event": "started"
                    },
                    {
                        "time": "2017-10-20 12:35:24.556004",
                        "event": "done"
                    }
                ]
            }
        },


I've read thread
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021588.html
.
Very similar problem, can it be connected to Proxmox? I have quite old
version of proxmox-ve: 4.4-80, and ceph jewel clients on pve nodes.

С уважением,
Ухина Ольга

Моб. тел.: 8(905)-566-46-62

2017-10-20 11:05 GMT+03:00 Ольга Ухина <olga.uh...@gmail.com>:

> Hi! Thanks for your help.
> How can I increase interval of history for command ceph daemon osd.<id>
> dump_historic_ops? It shows only for several minutes.
> I see slow requests on random osds each time and on different hosts (there
> are three). As I see in logs the problem doesn't relate to scrubbing.
>
> Regards,
> Olga Ukhina
>
>
> 2017-10-20 4:42 GMT+03:00 Brad Hubbard <bhubb...@redhat.com>:
>
>> I guess you have both read and followed
>> http://docs.ceph.com/docs/master/rados/troubleshooting/troub
>> leshooting-osd/?highlight=backfill#debugging-slow-requests
>>
>> What was the result?
>>
>> On Fri, Oct 20, 2017 at 2:50 AM, J David <j.david.li...@gmail.com> wrote:
>> > On Wed, Oct 18, 2017 at 8:12 AM, Ольга Ухина <olga.uh...@gmail.com>
>> wrote:
>> >> I have a problem with ceph luminous 12.2.1.
>> >> […]
>> >> I have slow requests on different OSDs on random time (for example at
>> night,
>> >> but I don’t see any problems at the time of problem
>> >> […]
>> >> 2017-10-18 01:20:38.187326 mon.st3 mon.0 10.192.1.78:6789/0 22689 :
>> cluster
>> >> [WRN] Health check update: 49 slow requests are blocked > 32 sec
>> >> (REQUEST_SLOW)
>> >
>> > This looks almost exactly like what we have been experiencing, and
>> > your use-case (Proxmox client using rbd) is the same as ours as well.
>> >
>> > Unfortunately we were not able to find the source of the issue so far,
>> > and haven’t gotten much feedback from the list.  Extensive testing of
>> > every component has ruled out any hardware issue we can think of.
>> >
>> > Originally we thought our issue was related to deep-scrub, but that
>> > now appears not to be the case, as it happens even when nothing is
>> > being deep-scrubbed.  Nonetheless, although they aren’t the cause,
>> > they definitely make the problem much worse.  So you may want to check
>> > to see if deep-scrub operations are happening at the times where you
>> > see issues and (if so) whether the OSDs participating in the
>> > deep-scrub are the same ones reporting slow requests.
>> >
>> > Hopefully you have better luck finding/fixing this than we have!  It’s
>> > definitely been a very frustrating issue for us.
>> >
>> > Thanks!
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Cheers,
>> Brad
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to