Thanks for the reply! My answers are inline. > On 2 Oct 2018, at 21:51, Paul Emmerich <paul.emmer...@croit.io> wrote: > > (Didn't follow the whole story, so you might have already answered that) > Did you check what the OSDs are doing during the period of high disk > utilization? > As in: > > * running perf top Did not cross my mind. Thanks for the pop-up! Will do. > * sampling a few stack traces from procfs or gdb I have strace for OSD. https://paste.ubuntu.com/p/8n2kTvwnG6/ > * or just high log settings They have default debug settings and also log disk is different. Indeed I have a fairly fast system. OS disks are Mirror SSD, WALs+DBs are mirrored NvME and OSD disks are NL-SAS. All hardware came from Dell (R730). Also 28 Core and 256GB RAM per server and 2x10Ge cluster and 2x10Gbe for public networks. > * running "status" on the admin socket locally I can run daemon and see status. I must have checked it but will do again. > > > Paul > > Am Di., 2. Okt. 2018 um 20:02 Uhr schrieb Goktug Yildirim > <goktug.yildi...@gmail.com>: >> >> Hello Darius, >> >> Thanks for reply! >> >> The main problem is we can not query PGs. “ceph pg 67.54f query” does stucks >> and wait forever since OSD is unresponsive. >> We are certain that OSD gets unresponsive as soon as it UP. And we are >> certain that OSD responds again after its disk utilization stops. >> >> So we have a small test like that: >> * Stop all OSDs (168 of them) >> * Start OSD1. %95 osd disk utilization immediately starts. It takes 8 mins >> to finish. Only after that “ceph pg 67.54f query” works! >> * While OSD1 is “up" start OSD2. As soon as OSD2 starts OSD1 & OSD2 starts >> %95 disk utilization. This takes 17 minutes to finish. >> * Now start OSD3 and it is the same. All OSDs start high I/O and it takes 25 >> mins to settle. >> * If you happen to start 5 of them at the same all of the OSDs start high >> I/O again. And it takes 1 hour to finish. >> >> So in the light of these findings we flagged noup, started all OSDs. At >> first there was no I/O. After 10 minutes we unset noup. All of 168 OSD >> started to make high I/O. And we thought that if we wait long enough it will >> finish & OSDs will be responsive again. After 24hours they did not because >> I/O did not finish or even slowed down. >> One can think that is a lot of data there to scan. But it is just 33TB. >> >> So at short we dont know which PG is stuck so we can remove it. >> >> However we met an weird thing half an hour ago. We exported the same PG from >> two different OSDs. One was 4.2GB and the other is 500KB! So we decided to >> export all OSDs for backup. Then we will delete strange sized ones and start >> the cluster all over. Maybe then we could solve the stucked or unfound PGs >> as you advise. >> >> Any thought would be greatly appreciated. >> >> >>> On 2 Oct 2018, at 18:16, Darius Kasparavičius <daz...@gmail.com> wrote: >>> >>> Hello, >>> >>> Currently you have 15 objects missing. I would recommend finding them >>> and making backups of them. Ditch all other osds that are failing to >>> start and concentrate on bringing online those that have missing >>> objects. Then slowly turn off nodown and noout on the cluster and see >>> if it stabilises. If it stabilises leave these setting if not turn >>> them back on. >>> Now get some of the pg's that are blocked and querry the pgs to check >>> why they are blocked. Try removing as much blocks as possible and then >>> remove the norebalance/norecovery flags and see if it starts to fix >>> itself. On Tue, Oct 2, 2018 at 5:14 PM by morphin >>> <morphinwith...@gmail.com> wrote: >>>> >>>> One of ceph experts indicated that bluestore is somewhat preview tech >>>> (as for Redhat). >>>> So it could be best to checkout bluestore and rocksdb. There are some >>>> tools to check health and also repair. But there are limited >>>> documentation. >>>> Anyone who has experince with it? >>>> Anyone lead/help to a proper check would be great. >>>> Goktug Yildirim <goktug.yildi...@gmail.com>, 1 Eki 2018 Pzt, 22:55 >>>> tarihinde şunu yazdı: >>>>> >>>>> Hi all, >>>>> >>>>> We have recently upgraded from luminous to mimic. It’s been 6 days since >>>>> this cluster is offline. The long short story is here: >>>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030078.html >>>>> >>>>> I’ve also CC’ed developers since I believe this is a bug. If this is not >>>>> to correct way I apology and please let me know. >>>>> >>>>> For the 6 days lots of thing happened and there were some outcomes about >>>>> the problem. Some of them was misjudged and some of them are not looked >>>>> deeper. >>>>> However the most certain diagnosis is this: each OSD causes very high >>>>> disk I/O to its bluestore disk (WAL and DB are fine). After that OSDs >>>>> become unresponsive or very very less responsive. For example "ceph tell >>>>> osd.x version” stucks like for ever. >>>>> >>>>> So due to unresponsive OSDs cluster does not settle. This is our problem! >>>>> >>>>> This is the one we are very sure of. But we are not sure of the reason. >>>>> >>>>> Here is the latest ceph status: >>>>> https://paste.ubuntu.com/p/2DyZ5YqPjh/. >>>>> >>>>> This is the status after we started all of the OSDs 24 hours ago. >>>>> Some of the OSDs are not started. However it didnt make any difference >>>>> when all of them was online. >>>>> >>>>> Here is the debug=20 log of an OSD which is same for all others: >>>>> https://paste.ubuntu.com/p/8n2kTvwnG6/ >>>>> As we figure out there is a loop pattern. I am sure it wont caught from >>>>> eye. >>>>> >>>>> This the full log the same OSD. >>>>> https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0 >>>>> >>>>> Here is the strace of the same OSD process: >>>>> https://paste.ubuntu.com/p/8n2kTvwnG6/ >>>>> >>>>> Recently we hear more to uprade mimic. I hope none get hurts as we do. I >>>>> am sure we have done lots of mistakes to let this happening. And this >>>>> situation may be a example for other user and could be a potential bug >>>>> for ceph developer. >>>>> >>>>> Any help to figure out what is going on would be great. >>>>> >>>>> Best Regards, >>>>> Goktug Yildirim >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com