(Didn't follow the whole story, so you might have already answered that)
Did you check what the OSDs are doing during the period of high disk
utilization?
As in:

* running perf top
* sampling a few stack traces from procfs or gdb
* or just high log settings
* running "status" on the admin socket locally


Paul

Am Di., 2. Okt. 2018 um 20:02 Uhr schrieb Goktug Yildirim
<goktug.yildi...@gmail.com>:
>
> Hello Darius,
>
> Thanks for reply!
>
> The main problem is we can not query PGs. “ceph pg 67.54f query” does stucks 
> and wait forever since OSD is unresponsive.
> We are certain that OSD gets unresponsive as soon as it UP. And we are 
> certain that OSD responds again after its disk utilization stops.
>
> So we have a small test like that:
> * Stop all OSDs (168 of them)
> * Start OSD1. %95 osd disk utilization immediately starts. It takes 8 mins to 
> finish. Only after that “ceph pg 67.54f query” works!
> * While OSD1 is “up" start OSD2. As soon as OSD2 starts OSD1 & OSD2 starts 
> %95 disk utilization. This takes 17 minutes to finish.
> * Now start OSD3 and it is the same. All OSDs start high I/O and it takes 25 
> mins to settle.
> * If you happen to start 5 of them at the same all of the OSDs start high I/O 
> again. And it takes 1 hour to finish.
>
> So in the light of these findings we flagged noup, started all OSDs. At first 
> there was no I/O. After 10 minutes we unset noup. All of 168 OSD started to 
> make high I/O. And we thought that if we wait long enough it will finish & 
> OSDs will be responsive again. After 24hours they did not because I/O did not 
> finish or even slowed down.
> One can think that is a lot of data there to scan. But it is just 33TB.
>
> So at short we dont know which PG is stuck so we can remove it.
>
> However we met an weird thing half an hour ago. We exported the same PG from 
> two different OSDs. One was 4.2GB and the other is 500KB! So we decided to 
> export all OSDs for backup. Then we will delete strange sized ones and start 
> the cluster all over. Maybe then we could solve the stucked or unfound PGs as 
> you advise.
>
> Any thought would be greatly appreciated.
>
>
> > On 2 Oct 2018, at 18:16, Darius Kasparavičius <daz...@gmail.com> wrote:
> >
> > Hello,
> >
> > Currently you have 15 objects missing. I would recommend finding them
> > and making backups of them. Ditch all other osds that are failing to
> > start and concentrate on bringing online those that have missing
> > objects. Then slowly turn off nodown and noout on the cluster and see
> > if it stabilises. If it stabilises leave these setting if not turn
> > them back on.
> > Now get some of the pg's that are blocked and querry the pgs to check
> > why they are blocked. Try removing as much blocks as possible and then
> > remove the norebalance/norecovery flags and see if it starts to fix
> > itself. On Tue, Oct 2, 2018 at 5:14 PM by morphin
> > <morphinwith...@gmail.com> wrote:
> >>
> >> One of ceph experts indicated that bluestore is somewhat preview tech
> >> (as for Redhat).
> >> So it could be best to checkout bluestore and rocksdb. There are some
> >> tools to check health and also repair. But there are limited
> >> documentation.
> >> Anyone who has experince with it?
> >> Anyone lead/help to a proper check would be great.
> >> Goktug Yildirim <goktug.yildi...@gmail.com>, 1 Eki 2018 Pzt, 22:55
> >> tarihinde şunu yazdı:
> >>>
> >>> Hi all,
> >>>
> >>> We have recently upgraded from luminous to mimic. It’s been 6 days since 
> >>> this cluster is offline. The long short story is here: 
> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030078.html
> >>>
> >>> I’ve also CC’ed developers since I believe this is a bug. If this is not 
> >>> to correct way I apology and please let me know.
> >>>
> >>> For the 6 days lots of thing happened and there were some outcomes about 
> >>> the problem. Some of them was misjudged and some of them are not looked 
> >>> deeper.
> >>> However the most certain diagnosis is this: each OSD causes very high 
> >>> disk I/O to its bluestore disk (WAL and DB are fine). After that OSDs 
> >>> become unresponsive or very very less responsive. For example "ceph tell 
> >>> osd.x version” stucks like for ever.
> >>>
> >>> So due to unresponsive OSDs cluster does not settle. This is our problem!
> >>>
> >>> This is the one we are very sure of. But we are not sure of the reason.
> >>>
> >>> Here is the latest ceph status:
> >>> https://paste.ubuntu.com/p/2DyZ5YqPjh/.
> >>>
> >>> This is the status after we started all of the OSDs 24 hours ago.
> >>> Some of the OSDs are not started. However it didnt make any difference 
> >>> when all of them was online.
> >>>
> >>> Here is the debug=20 log of an OSD which is same for all others:
> >>> https://paste.ubuntu.com/p/8n2kTvwnG6/
> >>> As we figure out there is a loop pattern. I am sure it wont caught from 
> >>> eye.
> >>>
> >>> This the full log the same OSD.
> >>> https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0
> >>>
> >>> Here is the strace of the same OSD process:
> >>> https://paste.ubuntu.com/p/8n2kTvwnG6/
> >>>
> >>> Recently we hear more to uprade mimic. I hope none get hurts as we do. I 
> >>> am sure we have done lots of mistakes to let this happening. And this 
> >>> situation may be a example for other user and could be a potential bug 
> >>> for ceph developer.
> >>>
> >>> Any help to figure out what is going on would be great.
> >>>
> >>> Best Regards,
> >>> Goktug Yildirim
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to