Hello Samuel, i'm bit afraid of restarting my osd's again, i'll wait until the weekend to push the config. BTW, i just unset sortbitwise flag.
--- Diego Castro / The CloudFather GetupCloud.com - Eliminamos a Gravidade 2016-06-01 13:39 GMT-03:00 Samuel Just <sj...@redhat.com>: > Can either of you reproduce with logs? That would make it a lot > easier to track down if it's a bug. I'd want > > debug osd = 20 > debug ms = 1 > debug filestore = 20 > > On all of the osds for a particular pg from when it is clean until it > develops an unfound object. > -Sam > > On Wed, Jun 1, 2016 at 5:36 AM, Diego Castro > <diego.cas...@getupcloud.com> wrote: > > Hello Uwe, i also have sortbitwise flag enable and i have the exactly > > behavior of yours. > > Perhaps this is also the root of my issues, does anybody knows if is > safe to > > disable it? > > > > > > --- > > Diego Castro / The CloudFather > > GetupCloud.com - Eliminamos a Gravidade > > > > 2016-06-01 7:17 GMT-03:00 Uwe Mesecke <u...@mesecke.net>: > >> > >> > >> > Am 01.06.2016 um 10:25 schrieb Diego Castro > >> > <diego.cas...@getupcloud.com>: > >> > > >> > Hello, i have a cluster running Jewel 10.2.0, 25 OSD's + 4 Mon. > >> > Today my cluster suddenly went unhealth with lots of stuck pg's due > >> > unfound objects, no disks failures nor node crashes, it just went bad. > >> > > >> > I managed to put the cluster on health state again by marking lost > >> > objects to delete "ceph pg <id> mark_unfound_lost delete". > >> > Regarding the fact that i have no idea why the cluster gone bad, i > >> > realized restarting the osd' daemons to unlock stuck clients put the > cluster > >> > on unhealth and pg gone stuck again due unfound objects. > >> > > >> > Does anyone have this issue? > >> > >> Hi, > >> > >> I also ran into that problem after upgrading to jewel. In my case I was > >> able to somewhat correlate this behavior with setting the sortbitwise > flag > >> after the upgrade. When the flag is set, after some time these unfound > >> objects are popping up. Restarting osds just makes it worse and/or makes > >> these problems appear faster. When looking at the missing objects I can > see > >> that sometimes even region or zone configuration objects for radosgw are > >> missing which I know are there because the radosgw was using these just > >> before. > >> > >> After unsetting the sortbitwise flag, the PGs go back to normal, all > >> previously unfound objects are found and the cluster becomes healthy > again. > >> > >> Of course I’m not sure whether this is the real root of the problem or > >> just a coincidence but I can reproduce this behavior every time. > >> > >> So for now the cluster is running without this flag. :-/ > >> > >> Regards, > >> Uwe > >> > >> > > >> > --- > >> > Diego Castro / The CloudFather > >> > GetupCloud.com - Eliminamos a Gravidade > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com