Have you ever used the FS? It's missing an object which we're intermittently seeing failures to create (on initial setup) when the cluster is unstable. If so, clear out the metadata pool and check the docs for "newfs". -Greg
On Monday, August 19, 2013, Georg Höllrigl wrote: > Hello List, > > The troubles to fix such a cluster continue... I get output like this now: > > # ceph health > HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; mds cluster is > degraded; mds vvx-ceph-m-03 is laggy > > > When checking for the ceph-mds processes, there are now none left... no > matter which server I check. And the won't start up again!? > > The log starts up with: > 2013-08-19 11:23:30.503214 7f7e9dfbd780 0 ceph version 0.67 (** > e3b7bc5bce8ab330ec166138107236**8af3c218a0), process ceph-mds, pid 27636 > 2013-08-19 11:23:30.523314 7f7e9904b700 1 mds.-1.0 handle_mds_map standby > 2013-08-19 11:23:30.529418 7f7e9904b700 1 mds.0.26 handle_mds_map i am > now mds.0.26 > 2013-08-19 11:23:30.529423 7f7e9904b700 1 mds.0.26 handle_mds_map state > change up:standby --> up:replay > 2013-08-19 11:23:30.529426 7f7e9904b700 1 mds.0.26 replay_start > 2013-08-19 11:23:30.529434 7f7e9904b700 1 mds.0.26 recovery set is > 2013-08-19 11:23:30.529436 7f7e9904b700 1 mds.0.26 need osdmap epoch > 277, have 276 > 2013-08-19 11:23:30.529438 7f7e9904b700 1 mds.0.26 waiting for osdmap > 277 (which blacklists prior instance) > 2013-08-19 11:23:30.534090 7f7e9904b700 -1 mds.0.sessionmap _load_finish > got (2) No such file or directory > 2013-08-19 11:23:30.535483 7f7e9904b700 -1 mds/SessionMap.cc: In function > 'void SessionMap::_load_finish(int, ceph::bufferlist&)' thread 7f7e9904b700 > time 2013-08-19 11:23:30.534107 > mds/SessionMap.cc: 83: FAILED assert(0 == "failed to load sessionmap") > > > Anyone an idea how to get the cluster back running? > > > > > > Georg > > > > > On 16.08.2013 16:23, Mark Nelson wrote: > >> Hi Georg, >> >> I'm not an expert on the monitors, but that's probably where I would >> start. Take a look at your monitor logs and see if you can get a sense >> for why one of your monitors is down. Some of the other devs will >> probably be around later that might know if there are any known issues >> with recreating the OSDs and missing PGs. >> >> Mark >> >> On 08/16/2013 08:21 AM, Georg Höllrigl wrote: >> >>> Hello, >>> >>> I'm still evaluating ceph - now a test cluster with the 0.67 dumpling. >>> I've created the setup with ceph-deploy from GIT. >>> I've recreated a bunch of OSDs, to give them another journal. >>> There already was some test data on these OSDs. >>> I've already recreated the missing PGs with "ceph pg force_create_pg" >>> >>> >>> HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 5 requests >>> are blocked > 32 sec; mds cluster is degraded; 1 mons down, quorum >>> 0,1,2 vvx-ceph-m-01,vvx-ceph-m-02,**vvx-ceph-m-03 >>> >>> Any idea how to fix the cluster, besides completley rebuilding the >>> cluster from scratch? What if such a thing happens in a production >>> environment... >>> >>> The pgs from "ceph pg dump" looks all like creating for some time now: >>> >>> 2.3d 0 0 0 0 0 0 0 creating >>> 2013-08-16 13:43:08.186537 0'0 0:0 [] [] 0'0 >>> 0.0000000'0 0.000000 >>> >>> Is there a way to just dump the data, that was on the discarded OSDs? >>> >>> >>> >>> >>> Kind Regards, >>> Georg >>> ______________________________**_________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>> >> >> ______________________________**_________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> > ______________________________**_________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > -- Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com