I mistyped the user list mail address. I am correcting and sending again. Apologies for the noise.
My mail is below. İleti başlangıcı: > Kimden: Goktug Yildirim <goktug.yildi...@gmail.com> > Tarih: 1 Ekim 2018 21:54:31 GMT+2 > Kime: ceph-users-j...@lists.ceph.com > Bilgi: ceph-de...@vger.kernel.org > Konu: Mimic offline problem > > Hi all, > > We have recently upgraded from luminous to mimic. It’s been 6 days since this > cluster is offline. The long short story is here: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030078.html > > I’ve also CC’ed developers since I believe this is a bug. If this is not to > correct way I apology and please let me know. > > For the 6 days lots of thing happened and there were some outcomes about the > problem. Some of them was misjudged and some of them are not looked deeper. > However the most certain diagnosis is this: each OSD causes very high disk > I/O to its bluestore disk (WAL and DB are fine). After that OSDs become > unresponsive or very very less responsive. For example "ceph tell osd.x > version” stucks like for ever. > > So due to unresponsive OSDs cluster does not settle. This is our problem! > > This is the one we are very sure of. But we are not sure of the reason. > > Here is the latest ceph status: > https://paste.ubuntu.com/p/2DyZ5YqPjh/. > > This is the status after we started all of the OSDs 24 hours ago. > Some of the OSDs are not started. However it didnt make any difference when > all of them was online. > > Here is the debug=20 log of an OSD which is same for all others: > https://paste.ubuntu.com/p/8n2kTvwnG6/ > As we figure out there is a loop pattern. I am sure it wont caught from eye. > > This the full log the same OSD. > https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0 > > Here is the strace of the same OSD process: > https://paste.ubuntu.com/p/8n2kTvwnG6/ > > Recently we hear more to uprade mimic. I hope none get hurts as we do. I am > sure we have done lots of mistakes to let this happening. And this situation > may be a example for other user and could be a potential bug for ceph > developer. > > Any help to figure out what is going on would be great. > > Best Regards, > Goktug Yildirim
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com