Anyone heart about osd_find_best_info_ignore_history_les = true ? Is that be usefull here? There is such a less information about it.
Goktug Yildirim <goktug.yildi...@gmail.com> şunları yazdı (2 Eki 2018 22:11): > Hi, > > Indeed I left ceph-disk to decide the wal and db partitions when I read > somewhere that that will do the proper sizing. > For the blustore cache size I have plenty of RAM. I will increase 8GB for > each and decide a more calculated number after cluster settles. > > For the osd map loading I’ve also figured it out. And it is in loop. For that > reason I started cluster with noup flag and waited OSDs to reach the uptodate > epoch number. After that I unset noup. But I did not pay attention to manager > logs. Let me check it, thank you! > > I am not forcing jmellac or anything else really. I have a very standard > installation and no tweaks or tunings. All we ask for the stability versus > speed from the begining. And here we are :/ > >> On 2 Oct 2018, at 21:53, Darius Kasparavičius <daz...@gmail.com> wrote: >> >> Hi, >> >> >> I can see some issues from the osd log file. You have an extremely low >> size db and wal partitions. Only 1GB for DB and 576MB for wal. I would >> recommend cranking up rocksdb cache size as much as possible. If you >> have RAM you can also increase bluestores cache size for hdd. Default >> is 1GB be as liberal as you can without getting OOM kills. You also >> have lots of osd map loading and decoding in the log. Are you sure all >> monitors/managers/osds are up to date? Plus make sure you aren't >> forcing jemalloc loading. I had a funny interaction after upgrading to >> mimic. >> On Tue, Oct 2, 2018 at 9:02 PM Goktug Yildirim >> <goktug.yildi...@gmail.com> wrote: >>> >>> Hello Darius, >>> >>> Thanks for reply! >>> >>> The main problem is we can not query PGs. “ceph pg 67.54f query” does >>> stucks and wait forever since OSD is unresponsive. >>> We are certain that OSD gets unresponsive as soon as it UP. And we are >>> certain that OSD responds again after its disk utilization stops. >>> >>> So we have a small test like that: >>> * Stop all OSDs (168 of them) >>> * Start OSD1. %95 osd disk utilization immediately starts. It takes 8 mins >>> to finish. Only after that “ceph pg 67.54f query” works! >>> * While OSD1 is “up" start OSD2. As soon as OSD2 starts OSD1 & OSD2 starts >>> %95 disk utilization. This takes 17 minutes to finish. >>> * Now start OSD3 and it is the same. All OSDs start high I/O and it takes >>> 25 mins to settle. >>> * If you happen to start 5 of them at the same all of the OSDs start high >>> I/O again. And it takes 1 hour to finish. >>> >>> So in the light of these findings we flagged noup, started all OSDs. At >>> first there was no I/O. After 10 minutes we unset noup. All of 168 OSD >>> started to make high I/O. And we thought that if we wait long enough it >>> will finish & OSDs will be responsive again. After 24hours they did not >>> because I/O did not finish or even slowed down. >>> One can think that is a lot of data there to scan. But it is just 33TB. >>> >>> So at short we dont know which PG is stuck so we can remove it. >>> >>> However we met an weird thing half an hour ago. We exported the same PG >>> from two different OSDs. One was 4.2GB and the other is 500KB! So we >>> decided to export all OSDs for backup. Then we will delete strange sized >>> ones and start the cluster all over. Maybe then we could solve the stucked >>> or unfound PGs as you advise. >>> >>> Any thought would be greatly appreciated. >>> >>> >>>> On 2 Oct 2018, at 18:16, Darius Kasparavičius <daz...@gmail.com> wrote: >>>> >>>> Hello, >>>> >>>> Currently you have 15 objects missing. I would recommend finding them >>>> and making backups of them. Ditch all other osds that are failing to >>>> start and concentrate on bringing online those that have missing >>>> objects. Then slowly turn off nodown and noout on the cluster and see >>>> if it stabilises. If it stabilises leave these setting if not turn >>>> them back on. >>>> Now get some of the pg's that are blocked and querry the pgs to check >>>> why they are blocked. Try removing as much blocks as possible and then >>>> remove the norebalance/norecovery flags and see if it starts to fix >>>> itself. On Tue, Oct 2, 2018 at 5:14 PM by morphin >>>> <morphinwith...@gmail.com> wrote: >>>>> >>>>> One of ceph experts indicated that bluestore is somewhat preview tech >>>>> (as for Redhat). >>>>> So it could be best to checkout bluestore and rocksdb. There are some >>>>> tools to check health and also repair. But there are limited >>>>> documentation. >>>>> Anyone who has experince with it? >>>>> Anyone lead/help to a proper check would be great. >>>>> Goktug Yildirim <goktug.yildi...@gmail.com>, 1 Eki 2018 Pzt, 22:55 >>>>> tarihinde şunu yazdı: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> We have recently upgraded from luminous to mimic. It’s been 6 days since >>>>>> this cluster is offline. The long short story is here: >>>>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030078.html >>>>>> >>>>>> I’ve also CC’ed developers since I believe this is a bug. If this is not >>>>>> to correct way I apology and please let me know. >>>>>> >>>>>> For the 6 days lots of thing happened and there were some outcomes about >>>>>> the problem. Some of them was misjudged and some of them are not looked >>>>>> deeper. >>>>>> However the most certain diagnosis is this: each OSD causes very high >>>>>> disk I/O to its bluestore disk (WAL and DB are fine). After that OSDs >>>>>> become unresponsive or very very less responsive. For example "ceph tell >>>>>> osd.x version” stucks like for ever. >>>>>> >>>>>> So due to unresponsive OSDs cluster does not settle. This is our problem! >>>>>> >>>>>> This is the one we are very sure of. But we are not sure of the reason. >>>>>> >>>>>> Here is the latest ceph status: >>>>>> https://paste.ubuntu.com/p/2DyZ5YqPjh/. >>>>>> >>>>>> This is the status after we started all of the OSDs 24 hours ago. >>>>>> Some of the OSDs are not started. However it didnt make any difference >>>>>> when all of them was online. >>>>>> >>>>>> Here is the debug=20 log of an OSD which is same for all others: >>>>>> https://paste.ubuntu.com/p/8n2kTvwnG6/ >>>>>> As we figure out there is a loop pattern. I am sure it wont caught from >>>>>> eye. >>>>>> >>>>>> This the full log the same OSD. >>>>>> https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0 >>>>>> >>>>>> Here is the strace of the same OSD process: >>>>>> https://paste.ubuntu.com/p/8n2kTvwnG6/ >>>>>> >>>>>> Recently we hear more to uprade mimic. I hope none get hurts as we do. I >>>>>> am sure we have done lots of mistakes to let this happening. And this >>>>>> situation may be a example for other user and could be a potential bug >>>>>> for ceph developer. >>>>>> >>>>>> Any help to figure out what is going on would be great. >>>>>> >>>>>> Best Regards, >>>>>> Goktug Yildirim >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com