(Didn't follow the whole story, so you might have already answered that) Did you check what the OSDs are doing during the period of high disk utilization? As in:
* running perf top * sampling a few stack traces from procfs or gdb * or just high log settings * running "status" on the admin socket locally Paul Am Di., 2. Okt. 2018 um 20:02 Uhr schrieb Goktug Yildirim <goktug.yildi...@gmail.com>: > > Hello Darius, > > Thanks for reply! > > The main problem is we can not query PGs. “ceph pg 67.54f query” does stucks > and wait forever since OSD is unresponsive. > We are certain that OSD gets unresponsive as soon as it UP. And we are > certain that OSD responds again after its disk utilization stops. > > So we have a small test like that: > * Stop all OSDs (168 of them) > * Start OSD1. %95 osd disk utilization immediately starts. It takes 8 mins to > finish. Only after that “ceph pg 67.54f query” works! > * While OSD1 is “up" start OSD2. As soon as OSD2 starts OSD1 & OSD2 starts > %95 disk utilization. This takes 17 minutes to finish. > * Now start OSD3 and it is the same. All OSDs start high I/O and it takes 25 > mins to settle. > * If you happen to start 5 of them at the same all of the OSDs start high I/O > again. And it takes 1 hour to finish. > > So in the light of these findings we flagged noup, started all OSDs. At first > there was no I/O. After 10 minutes we unset noup. All of 168 OSD started to > make high I/O. And we thought that if we wait long enough it will finish & > OSDs will be responsive again. After 24hours they did not because I/O did not > finish or even slowed down. > One can think that is a lot of data there to scan. But it is just 33TB. > > So at short we dont know which PG is stuck so we can remove it. > > However we met an weird thing half an hour ago. We exported the same PG from > two different OSDs. One was 4.2GB and the other is 500KB! So we decided to > export all OSDs for backup. Then we will delete strange sized ones and start > the cluster all over. Maybe then we could solve the stucked or unfound PGs as > you advise. > > Any thought would be greatly appreciated. > > > > On 2 Oct 2018, at 18:16, Darius Kasparavičius <daz...@gmail.com> wrote: > > > > Hello, > > > > Currently you have 15 objects missing. I would recommend finding them > > and making backups of them. Ditch all other osds that are failing to > > start and concentrate on bringing online those that have missing > > objects. Then slowly turn off nodown and noout on the cluster and see > > if it stabilises. If it stabilises leave these setting if not turn > > them back on. > > Now get some of the pg's that are blocked and querry the pgs to check > > why they are blocked. Try removing as much blocks as possible and then > > remove the norebalance/norecovery flags and see if it starts to fix > > itself. On Tue, Oct 2, 2018 at 5:14 PM by morphin > > <morphinwith...@gmail.com> wrote: > >> > >> One of ceph experts indicated that bluestore is somewhat preview tech > >> (as for Redhat). > >> So it could be best to checkout bluestore and rocksdb. There are some > >> tools to check health and also repair. But there are limited > >> documentation. > >> Anyone who has experince with it? > >> Anyone lead/help to a proper check would be great. > >> Goktug Yildirim <goktug.yildi...@gmail.com>, 1 Eki 2018 Pzt, 22:55 > >> tarihinde şunu yazdı: > >>> > >>> Hi all, > >>> > >>> We have recently upgraded from luminous to mimic. It’s been 6 days since > >>> this cluster is offline. The long short story is here: > >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030078.html > >>> > >>> I’ve also CC’ed developers since I believe this is a bug. If this is not > >>> to correct way I apology and please let me know. > >>> > >>> For the 6 days lots of thing happened and there were some outcomes about > >>> the problem. Some of them was misjudged and some of them are not looked > >>> deeper. > >>> However the most certain diagnosis is this: each OSD causes very high > >>> disk I/O to its bluestore disk (WAL and DB are fine). After that OSDs > >>> become unresponsive or very very less responsive. For example "ceph tell > >>> osd.x version” stucks like for ever. > >>> > >>> So due to unresponsive OSDs cluster does not settle. This is our problem! > >>> > >>> This is the one we are very sure of. But we are not sure of the reason. > >>> > >>> Here is the latest ceph status: > >>> https://paste.ubuntu.com/p/2DyZ5YqPjh/. > >>> > >>> This is the status after we started all of the OSDs 24 hours ago. > >>> Some of the OSDs are not started. However it didnt make any difference > >>> when all of them was online. > >>> > >>> Here is the debug=20 log of an OSD which is same for all others: > >>> https://paste.ubuntu.com/p/8n2kTvwnG6/ > >>> As we figure out there is a loop pattern. I am sure it wont caught from > >>> eye. > >>> > >>> This the full log the same OSD. > >>> https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0 > >>> > >>> Here is the strace of the same OSD process: > >>> https://paste.ubuntu.com/p/8n2kTvwnG6/ > >>> > >>> Recently we hear more to uprade mimic. I hope none get hurts as we do. I > >>> am sure we have done lots of mistakes to let this happening. And this > >>> situation may be a example for other user and could be a potential bug > >>> for ceph developer. > >>> > >>> Any help to figure out what is going on would be great. > >>> > >>> Best Regards, > >>> Goktug Yildirim > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com