Hi Luke, (copying list back in)
You should stop all MDS services before attempting to use --reset-journal (but make sure mons and OSDs are running). The status of the mds map shouldn't make a difference. John On Tue, Mar 18, 2014 at 5:23 PM, Luke Jing Yuan <jyl...@mimos.my> wrote: > Hi John, > > I noticed that while we was running with the --reset-journal option, ceph.log > keep show something like the following lines: > > 2014-03-19 01:17:09.977892 mon.0 10.4.118.21:6789/0 192 : [INF] mdsmap > e29851: 1/1/1 up {0=mon01=up:replay(laggy or crashed)} > > And the mdsmap epock just keep increasing, is this what we should be > expecting? Also should we consider using "ceph mds" command to fail the mds > before the running with --reset-journal. > > Apologize for being asking so many times. Thanks in advance. > > Regards, > Luke > > -----Original Message----- > From: John Spray [mailto:john.sp...@inktank.com] > Sent: Tuesday, 18 March, 2014 8:12 PM > To: Luke Jing Yuan > Cc: Wong Ming Tat; Mohd Bazli Ab Karim > Subject: Re: [ceph-users] Ceph MDS replaying journal > > That command should be almost instant, so it sounds like it has become stuck. > Run with "-d --debug-mds=20" to get more output. One way it can get stuck > is if the "-i" argument doesn't correspond to the host you're running on, in > which case it gets stuck trying to find keys. I put "-i mon0" in the example > command because that looked like the host you were running on, but perhaps > you're running from somewhere else. > > I must emphasize that this is all very unsupported. If you have data that is > critical for your users you should preferably restore it from backups. > > John > > On Tue, Mar 18, 2014 at 12:04 PM, Luke Jing Yuan <jyl...@mimos.my> wrote: >> Hi John, >> >> We are using the 2nd option you mentioned, but after more than 10hours of >> running we have no idea whether its working nor when it will complete. Are >> there any way for us to further monitor the progress? We dare not use the >> newfs option as there are data that are critical to our user. Kindly advice. >> >> Thanks. >> >> Regards, >> Luke >> >> -----Original Message----- >> From: ceph-users-boun...@lists.ceph.com >> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Luke Jing Yuan >> Sent: Tuesday, 18 March, 2014 2:33 PM >> To: John Spray >> Cc: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] Ceph MDS replaying journal >> >> Hi John, >> >> Is there a way for us to verify that step 2 is working properly? We are >> seeing the process running for almost 4 hours but there is no indication >> when it will end. Thanks. >> >> Regards, >> Luke >> >> -----Original Message----- >> From: John Spray [mailto:john.sp...@inktank.com] >> Sent: Tuesday, 18 March, 2014 5:13 AM >> To: Luke Jing Yuan >> Cc: Wong Ming Tat; ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] Ceph MDS replaying journal >> >> Thanks for sending the logs so quickly. >> >> 626 2014-03-18 00:58:01.009623 7fba5cbbe700 10 mds.0.journal >> EMetaBlob.replay sessionmap v8632368 -(1|2) == table 7235981 prealloc >> [1000041df86~1] used 1000041db9e >> 627 2014-03-18 00:58:01.009627 7fba5cbbe700 20 mds.0.journal (session >> prealloc [10000373451~3e8]) >> 628 2014-03-18 00:58:01.010696 7fba5cbbe700 -1 mds/journal.cc: In function >> 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' >> thread 7fba5cbbe 700 time 2014-03-18 00:58:01.009644 >> >> The first line indicates that the version of SessionMap loaded from disk is >> 7235981 while the version updated in the journal is 8632368. >> The difference is much larger than one would expect, as we are only a few >> events into the journal at the point of the failure. The assertion is >> checking that the inode claimed by the journal is in the range allocated to >> the client session, and it is failing because the stale sessionmap version >> is in use. >> >> In version 0.72.2, there was a bug in the MDS that caused failures to >> write the SessionMap object to disk to be ignored. This could result >> in a situation where there is an inconsistency between the contents of >> the log and the contents of the SessionMap object. A check was added >> to avoid this in the latest code (b0dce8a0) >> >> In a future release we will be adding tools for repairing damaged systems in >> cases like this, but at the moment your options are quite limited. >> * If the data is replaceable then you might simply use "ceph mds newfs" to >> start from scratch. >> * If you can cope with losing some of the most recent modifications but >> keeping most of the filesystem, you could try the experimental journal reset >> function: >> ceph-mds -i mon0 -d --reset-journal 0 >> This is destructive: it will discard any metadata updates that have been >> written to the journal but not to the backing store. However, it is less >> destructive than newfs. It may crash when it completes, look for output >> like this at the beginning before any stack trace to indicate success: >> writing journal head >> writing EResetJournal entry >> done >> >> We are looking forward to making the MDS and associated tools more resilient >> ahead of making the filesystem a fully supported part of ceph. >> >> John >> >> On Mon, Mar 17, 2014 at 5:09 PM, Luke Jing Yuan <jyl...@mimos.my> wrote: >>> Hi John, >>> >>> Thanks for responding to our issues, attached is the ceph.log file as per >>> request. As for the ceph-mds.log, I will have to send it in 3 parts later >>> due to our SMTP server's policy. >>> >>> Regards, >>> Luke >>> >>> -----Original Message----- >>> From: ceph-users-boun...@lists.ceph.com >>> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John Spray >>> Sent: Tuesday, 18 March, 2014 12:57 AM >>> To: Wong Ming Tat >>> Cc: ceph-users@lists.ceph.com >>> Subject: Re: [ceph-users] Ceph MDS replaying journal >>> >>> Clarification: in step 1, stop the MDS service on *all* MDS servers (I >>> notice there are standby daemons in the "ceph status" output). >>> >>> John >>> >>> On Mon, Mar 17, 2014 at 4:45 PM, John Spray <john.sp...@inktank.com> wrote: >>>> Hello, >>>> >>>> To understand what's gone wrong here, we'll need to increase the >>>> verbosity of the logging from the MDS service and then trying >>>> starting it again. >>>> >>>> 1. Stop the MDS service (on ubuntu this would be "stop >>>> ceph-mds-all") 2. Move your old log file away so that we will have a >>>> fresh one mv /var/log/ceph/ceph-mds.mon01.log >>>> /var/log/ceph/ceph-mds.mon01.log.old >>>> 3. Start the mds service manually (so that it just tries once >>>> instead of flapping): >>>> ceph-mds -i mon01 -f --debug-mds=20 --debug-journaler=10 >>>> >>>> The resulting log file may be quite big so you may want to gzip it >>>> before sending it to the list. >>>> >>>> In addition to the MDS log, please attach your cluster log >>>> (/var/log/ceph/ceph.log). >>>> >>>> Thanks, >>>> John >>>> >>>> On Mon, Mar 17, 2014 at 7:02 AM, Wong Ming Tat <mt.w...@mimos.my> wrote: >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I receive the MDS replaying journal error as below. >>>>> >>>>> Hope anyone can give some information to solve this problem. >>>>> >>>>> >>>>> >>>>> # ceph health detail >>>>> >>>>> HEALTH_WARN mds cluster is degraded >>>>> >>>>> mds cluster is degraded >>>>> >>>>> mds.mon01 at x.x.x.x:6800/26426 rank 0 is replaying journal >>>>> >>>>> >>>>> >>>>> # ceph -s >>>>> >>>>> cluster xxxxxxxxxxxxxxxxxxxxxxxxxxxxx >>>>> >>>>> health HEALTH_WARN mds cluster is degraded >>>>> >>>>> monmap e1: 3 mons at >>>>> {mon01=x.x.x.x:6789/0,mon02=x.x.x.y:6789/0,mon03=x.x.x.z:6789/0}, >>>>> election epoch 1210, quorum 0,1,2 mon01,mon02,mon03 >>>>> >>>>> mdsmap e17020: 1/1/1 up {0=mon01=up:replay}, 2 up:standby >>>>> >>>>> osdmap e20195: 24 osds: 24 up, 24 in >>>>> >>>>> pgmap v1424671: 3300 pgs, 6 pools, 793 GB data, 3284 kobjects >>>>> >>>>> 1611 GB used, 87636 GB / 89248 GB avail >>>>> >>>>> 3300 active+clean >>>>> >>>>> client io 2750 kB/s rd, 0 op/s >>>>> >>>>> >>>>> >>>>> # cat /var/log/ceph/ceph-mds.mon01.log >>>>> >>>>> 2014-03-16 18:40:41.894404 7f0f2875c700 0 mds.0.server >>>>> handle_client_file_setlock: start: 0, length: 0, client: 324186, pid: >>>>> 30684, >>>>> pid_ns: 18446612141968944256, type: 4 >>>>> >>>>> >>>>> >>>>> 2014-03-16 18:49:09.993985 7f0f24645700 0 -- x.x.x.x:6801/3739 >> >>>>> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0 >>>>> c=0x100adc6e0).accept peer addr is really y.y.y.y:0/1662262473 >>>>> (socket is >>>>> y.y.y.y:33592/0) >>>>> >>>>> 2014-03-16 18:49:10.000197 7f0f24645700 0 -- x.x.x.x:6801/3739 >> >>>>> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0 >>>>> c=0x100adc6e0).accept connect_seq 0 vs existing 1 state standby >>>>> >>>>> 2014-03-16 18:49:10.000239 7f0f24645700 0 -- x.x.x.x:6801/3739 >> >>>>> y.y.y.y:0/1662262473 pipe(0x728d2780 sd=26 :6801 s=0 pgs=0 cs=0 l=0 >>>>> c=0x100adc6e0).accept peer reset, then tried to connect to us, >>>>> replacing >>>>> >>>>> 2014-03-16 18:49:10.550726 7f4c34671780 0 ceph version 0.72.2 >>>>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid >>>>> 13282 >>>>> >>>>> 2014-03-16 18:49:10.826713 7f4c2f6f8700 1 mds.-1.0 handle_mds_map >>>>> standby >>>>> >>>>> 2014-03-16 18:49:10.984992 7f4c2f6f8700 1 mds.0.14 handle_mds_map >>>>> i am now >>>>> mds.0.14 >>>>> >>>>> 2014-03-16 18:49:10.985010 7f4c2f6f8700 1 mds.0.14 handle_mds_map >>>>> state change up:standby --> up:replay >>>>> >>>>> 2014-03-16 18:49:10.985017 7f4c2f6f8700 1 mds.0.14 replay_start >>>>> >>>>> 2014-03-16 18:49:10.985024 7f4c2f6f8700 1 mds.0.14 recovery set >>>>> is >>>>> >>>>> 2014-03-16 18:49:10.985027 7f4c2f6f8700 1 mds.0.14 need osdmap >>>>> epoch 3446, have 3445 >>>>> >>>>> 2014-03-16 18:49:10.985030 7f4c2f6f8700 1 mds.0.14 waiting for >>>>> osdmap 3446 (which blacklists prior instance) >>>>> >>>>> 2014-03-16 18:49:16.945500 7f4c2f6f8700 0 mds.0.cache creating >>>>> system inode with ino:100 >>>>> >>>>> 2014-03-16 18:49:16.945747 7f4c2f6f8700 0 mds.0.cache creating >>>>> system inode with ino:1 >>>>> >>>>> 2014-03-16 18:49:17.358681 7f4c2b5e1700 -1 mds/journal.cc: In >>>>> function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' >>>>> thread 7f4c2b5e1700 time 2014-03-16 18:49:17.356336 >>>>> >>>>> mds/journal.cc: 1316: FAILED assert(i == used_preallocated_ino) >>>>> >>>>> >>>>> >>>>> ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) >>>>> >>>>> 1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x7587) >>>>> [0x5af5e7] >>>>> >>>>> 2: (EUpdate::replay(MDS*)+0x3a) [0x5b67ea] >>>>> >>>>> 3: (MDLog::_replay_thread()+0x678) [0x79dbb8] >>>>> >>>>> 4: (MDLog::ReplayThread::entry()+0xd) [0x58bded] >>>>> >>>>> 5: (()+0x7e9a) [0x7f4c33a96e9a] >>>>> >>>>> 6: (clone()+0x6d) [0x7f4c3298b3fd] >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Wong Ming Tat >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> DISCLAIMER: >>>>> >>>>> This e-mail (including any attachments) is for the addressee(s) >>>>> only and may be confidential, especially as regards personal data. >>>>> If you are not the intended recipient, please note that any >>>>> dealing, review, distribution, printing, copying or use of this >>>>> e-mail is strictly prohibited. If you have received this email in >>>>> error, please notify the sender immediately and delete the original >>>>> message (including any attachments). >>>>> >>>>> >>>>> MIMOS Berhad is a research and development institution under the >>>>> purview of the Malaysian Ministry of Science, Technology and >>>>> Innovation. Opinions, conclusions and other information in this >>>>> e-mail that do not relate to the official business of MIMOS Berhad >>>>> and/or its subsidiaries shall be understood as neither given nor >>>>> endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS >>>>> Berhad nor its subsidiaries accepts responsibility for the same. >>>>> All liability arising from or in connection with computer viruses >>>>> and/or corrupted e-mails is excluded to the fullest extent permitted by >>>>> law. >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> ________________________________ >>> DISCLAIMER: >>> >>> >>> This e-mail (including any attachments) is for the addressee(s) only and >>> may be confidential, especially as regards personal data. If you are not >>> the intended recipient, please note that any dealing, review, distribution, >>> printing, copying or use of this e-mail is strictly prohibited. If you have >>> received this email in error, please notify the sender immediately and >>> delete the original message (including any attachments). >>> >>> >>> MIMOS Berhad is a research and development institution under the purview of >>> the Malaysian Ministry of Science, Technology and Innovation. Opinions, >>> conclusions and other information in this e-mail that do not relate to the >>> official business of MIMOS Berhad and/or its subsidiaries shall be >>> understood as neither given nor endorsed by MIMOS Berhad and/or its >>> subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts >>> responsibility for the same. All liability arising from or in connection >>> with computer viruses and/or corrupted e-mails is excluded to the fullest >>> extent permitted by law. >>> >>> >>> ------------------------------------------------------------------ >>> - >>> - >>> DISCLAIMER: >>> >>> This e-mail (including any attachments) is for the addressee(s) only >>> and may contain confidential information. If you are not the intended >>> recipient, please note that any dealing, review, distribution, >>> printing, copying or use of this e-mail is strictly prohibited. If >>> you have received this email in error, please notify the sender >>> immediately and delete the original message. >>> MIMOS Berhad is a research and development institution under the >>> purview of the Malaysian Ministry of Science, Technology and >>> Innovation. Opinions, conclusions and other information in this e- >>> mail that do not relate to the official business of MIMOS Berhad >>> and/or its subsidiaries shall be understood as neither given nor >>> endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS >>> Berhad nor its subsidiaries accepts responsibility for the same. All >>> liability arising from or in connection with computer viruses and/or >>> corrupted e-mails is excluded to the fullest extent permitted by law. >>> >> >> ________________________________ >> DISCLAIMER: >> >> >> This e-mail (including any attachments) is for the addressee(s) only and may >> be confidential, especially as regards personal data. If you are not the >> intended recipient, please note that any dealing, review, distribution, >> printing, copying or use of this e-mail is strictly prohibited. If you have >> received this email in error, please notify the sender immediately and >> delete the original message (including any attachments). >> >> >> MIMOS Berhad is a research and development institution under the purview of >> the Malaysian Ministry of Science, Technology and Innovation. Opinions, >> conclusions and other information in this e-mail that do not relate to the >> official business of MIMOS Berhad and/or its subsidiaries shall be >> understood as neither given nor endorsed by MIMOS Berhad and/or its >> subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts >> responsibility for the same. All liability arising from or in connection >> with computer viruses and/or corrupted e-mails is excluded to the fullest >> extent permitted by law. >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> ________________________________ >> DISCLAIMER: >> >> >> This e-mail (including any attachments) is for the addressee(s) only and may >> be confidential, especially as regards personal data. If you are not the >> intended recipient, please note that any dealing, review, distribution, >> printing, copying or use of this e-mail is strictly prohibited. If you have >> received this email in error, please notify the sender immediately and >> delete the original message (including any attachments). >> >> >> MIMOS Berhad is a research and development institution under the purview of >> the Malaysian Ministry of Science, Technology and Innovation. Opinions, >> conclusions and other information in this e-mail that do not relate to the >> official business of MIMOS Berhad and/or its subsidiaries shall be >> understood as neither given nor endorsed by MIMOS Berhad and/or its >> subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts >> responsibility for the same. All liability arising from or in connection >> with computer viruses and/or corrupted e-mails is excluded to the fullest >> extent permitted by law. >> >> ------------------------------------------------------------------ >> - >> - >> DISCLAIMER: >> >> This e-mail (including any attachments) is for the addressee(s) only >> and may contain confidential information. If you are not the intended >> recipient, please note that any dealing, review, distribution, >> printing, copying or use of this e-mail is strictly prohibited. If you >> have received this email in error, please notify the sender >> immediately and delete the original message. >> MIMOS Berhad is a research and development institution under the >> purview of the Malaysian Ministry of Science, Technology and >> Innovation. Opinions, conclusions and other information in this e- >> mail that do not relate to the official business of MIMOS Berhad >> and/or its subsidiaries shall be understood as neither given nor >> endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS >> Berhad nor its subsidiaries accepts responsibility for the same. All >> liability arising from or in connection with computer viruses and/or >> corrupted e-mails is excluded to the fullest extent permitted by law. >> >> > > ________________________________ > DISCLAIMER: > > > This e-mail (including any attachments) is for the addressee(s) only and may > be confidential, especially as regards personal data. If you are not the > intended recipient, please note that any dealing, review, distribution, > printing, copying or use of this e-mail is strictly prohibited. If you have > received this email in error, please notify the sender immediately and delete > the original message (including any attachments). > > > MIMOS Berhad is a research and development institution under the purview of > the Malaysian Ministry of Science, Technology and Innovation. Opinions, > conclusions and other information in this e-mail that do not relate to the > official business of MIMOS Berhad and/or its subsidiaries shall be understood > as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and > neither MIMOS Berhad nor its subsidiaries accepts responsibility for the > same. All liability arising from or in connection with computer viruses > and/or corrupted e-mails is excluded to the fullest extent permitted by law. _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com