On Wed, Apr 30, 2014 at 3:07 PM, Mohd Bazli Ab Karim <bazli.abka...@mimos.my> wrote: > Hi Zheng, > > Sorry for the late reply. For sure, I will try this again after we completely > verifying all content in the file system. Hopefully all will be good. > And, please confirm this, I will set debug_mds=10 for the ceph-mds, and do > you want me to send the ceph-mon log too?
yes please. > > BTW, how to confirm that the mds has passed the beacon to mon or not? > read monitor's log Regards Yan, Zheng > Thank you so much Zheng! > > Bazli > > -----Original Message----- > From: Yan, Zheng [mailto:uker...@gmail.com] > Sent: Tuesday, April 29, 2014 10:13 PM > To: Mohd Bazli Ab Karim > Cc: Luke Jing Yuan; Wong Ming Tat > Subject: Re: [ceph-users] Ceph mds laggy and failed assert in function replay > mds/journal.cc > > On Tue, Apr 29, 2014 at 5:30 PM, Mohd Bazli Ab Karim <bazli.abka...@mimos.my> > wrote: >> Hi Zheng, >> >> The another issue that Luke mentioned just now was like this. >> At first, we ran one mds (mon01) with the new compiled ceph-mds. It works >> fine with only one MDS running at that time. However, when we ran two more >> MDSes mon02 mon03 with the new compiled ceph-mds, it started acting weird. >> Mon01 which was became active at first, will have the error and started to >> respawning. Once respawning happened, mon03 will take over from mon01 as >> master mds, and replay happened again. >> Again, when mon03 became active, it will have the same error like below, and >> respawning again. So, it seems to me that replay will continue to happen >> from one mds to another when they got respawned. >> >> 2014-04-29 15:36:24.917798 7f5c36476700 1 mds.0.server >> reconnect_clients -- 1 sessions >> 2014-04-29 15:36:24.919620 7f5c2fb3e700 0 -- 10.4.118.23:6800/26401 >> >> 10.1.64.181:0/1558263174 pipe(0x2924f5780 sd=41 :6800 s=0 pgs=0 >> cs=0 l=0 c=0x37056e0).accept peer addr is really >> 10.1.64.181:0/1558263174 (socket is 10.1.64.181:57649/0) >> 2014-04-29 15:36:24.921661 7f5c36476700 0 log [DBG] : reconnect by >> client.884169 10.1.64.181:0/1558263174 after 0.003774 >> 2014-04-29 15:36:24.921786 7f5c36476700 1 mds.0.12858 reconnect_done >> 2014-04-29 15:36:25.109391 7f5c36476700 1 mds.0.12858 handle_mds_map >> i am now mds.0.12858 >> 2014-04-29 15:36:25.109413 7f5c36476700 1 mds.0.12858 handle_mds_map >> state change up:reconnect --> up:rejoin >> 2014-04-29 15:36:25.109417 7f5c36476700 1 mds.0.12858 rejoin_start >> 2014-04-29 15:36:26.918067 7f5c36476700 1 mds.0.12858 >> rejoin_joint_start >> 2014-04-29 15:36:33.520985 7f5c36476700 1 mds.0.12858 rejoin_done >> 2014-04-29 15:36:36.252925 7f5c36476700 1 mds.0.12858 handle_mds_map >> i am now mds.0.12858 >> 2014-04-29 15:36:36.252927 7f5c36476700 1 mds.0.12858 handle_mds_map >> state change up:rejoin --> up:active >> 2014-04-29 15:36:36.252932 7f5c36476700 1 mds.0.12858 recovery_done -- >> successful recovery! >> 2014-04-29 15:36:36.745833 7f5c36476700 1 mds.0.12858 active_start >> 2014-04-29 15:36:36.987854 7f5c36476700 1 mds.0.12858 cluster recovered. >> 2014-04-29 15:36:40.182604 7f5c36476700 0 mds.0.12858 >> handle_mds_beacon no longer laggy >> 2014-04-29 15:36:57.947441 7f5c2fb3e700 0 -- 10.4.118.23:6800/26401 >> >> 10.1.64.181:0/1558263174 pipe(0x2924f5780 sd=41 :6800 s=2 pgs=156 >> cs=1 l=0 c=0x37056e0).fault with nothing to send, going to standby >> 2014-04-29 15:37:10.534593 7f5c36476700 1 mds.-1.-1 handle_mds_map i >> (10.4.118.23:6800/26401) dne in the mdsmap, respawning myself >> 2014-04-29 15:37:10.534604 7f5c36476700 1 mds.-1.-1 respawn >> 2014-04-29 15:37:10.534609 7f5c36476700 1 mds.-1.-1 e: '/usr/bin/ceph-mds' >> 2014-04-29 15:37:10.534612 7f5c36476700 1 mds.-1.-1 0: '/usr/bin/ceph-mds' >> 2014-04-29 15:37:10.534616 7f5c36476700 1 mds.-1.-1 1: '--cluster=ceph' >> 2014-04-29 15:37:10.534619 7f5c36476700 1 mds.-1.-1 2: '-i' >> 2014-04-29 15:37:10.534621 7f5c36476700 1 mds.-1.-1 3: 'mon03' >> 2014-04-29 15:37:10.534623 7f5c36476700 1 mds.-1.-1 4: '-f' >> 2014-04-29 15:37:10.534641 7f5c36476700 1 mds.-1.-1 cwd / >> 2014-04-29 15:37:12.155458 7f8907c8b780 0 ceph version (), process >> ceph-mds, pid 26401 >> 2014-04-29 15:37:12.249780 7f8902d10700 1 mds.-1.0 handle_mds_map >> standby >> >> p/s. we ran ceph-mon and ceph-mds on same servers, (mon01,mon02,mon03) >> >> I sent to you two log files, mon01 and mon03 where the scenario of mon03 >> have state->standby->replay->active->respawned. And also, mon01 which is now >> running as active as a single MDS at this moment. >> > > After the MDS became ative, it did not send beacon to the monitor. It seems > like the MDS was busy doing something else. If this issue still happen, set > debug_mds=10 and send the log to me. > > Regards > Yan, Zheng > >> Regards, >> Bazli >> -----Original Message----- >> From: Luke Jing Yuan >> Sent: Tuesday, April 29, 2014 4:46 PM >> To: Yan, Zheng >> Cc: Mohd Bazli Ab Karim; Wong Ming Tat >> Subject: RE: [ceph-users] Ceph mds laggy and failed assert in function >> replay mds/journal.cc >> >> Hi Zheng, >> >> Thanks for the information. Actually we encounter another issue, in our >> original setup, we have 3 MDS running (say mon01, mon02 and mon03), when we >> do the replay/recovery we did it on mon01. After we completed, we restarted >> the mds again on mon02 and mon03 (without the mds_wipe_sessions and using >> the patched binary) but there was when we noticed something else, my >> colleague Mr. Bazli who started the original thread probably can explain a >> bit more on the observations made. >> >> Regards, >> Luke >> >> -----Original Message----- >> From: Yan, Zheng [mailto:uker...@gmail.com] >> Sent: Tuesday, 29 April, 2014 4:26 PM >> To: Luke Jing Yuan >> Subject: Re: [ceph-users] Ceph mds laggy and failed assert in function >> replay mds/journal.cc >> >> On Tue, Apr 29, 2014 at 3:43 PM, Luke Jing Yuan <jyl...@mimos.my> wrote: >>> Hi, >>> >>> MDS did finish the replay and working after that but we are wondering >>> should we leave the mds_wipe_sessions in ceph.conf or remove it. >>> >> >> should disable mds_wipe_sessions after mds starts working >> >>> Regards, >>> Luke >>> >>> -----Original Message----- >>> From: ceph-users-boun...@lists.ceph.com >>> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Yan, Zheng >>> Sent: Tuesday, 29 April, 2014 3:36 PM >>> To: Jingyuan Luke >>> Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com >>> Subject: Re: [ceph-users] Ceph mds laggy and failed assert in >>> function replay mds/journal.cc >>> >>> On Tue, Apr 29, 2014 at 3:13 PM, Jingyuan Luke <jyl...@gmail.com> wrote: >>>> Hi, >>>> >>>> Assuming we got MDS working back on track, should we still leave the >>>> mds_wipe_sessions in the ceph.conf or remove it and restart MDS. >>>> Thanks. >>> >>> No. >>> >>> It has been several hours. the MDS still does not finish replaying the >>> journal? >>> >>> Regards >>> Yan, Zheng >>> >>> >>> >>> ________________________________ >>> DISCLAIMER: >>> >>> >>> This e-mail (including any attachments) is for the addressee(s) only and >>> may be confidential, especially as regards personal data. If you are not >>> the intended recipient, please note that any dealing, review, distribution, >>> printing, copying or use of this e-mail is strictly prohibited. If you have >>> received this email in error, please notify the sender immediately and >>> delete the original message (including any attachments). >>> >>> >>> MIMOS Berhad is a research and development institution under the purview of >>> the Malaysian Ministry of Science, Technology and Innovation. Opinions, >>> conclusions and other information in this e-mail that do not relate to the >>> official business of MIMOS Berhad and/or its subsidiaries shall be >>> understood as neither given nor endorsed by MIMOS Berhad and/or its >>> subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts >>> responsibility for the same. All liability arising from or in connection >>> with computer viruses and/or corrupted e-mails is excluded to the fullest >>> extent permitted by law. >>> >>> ------------------------------------------------------------------ >>> - >>> - >>> DISCLAIMER: >>> >>> This e-mail (including any attachments) is for the addressee(s) only >>> and may contain confidential information. If you are not the intended >>> recipient, please note that any dealing, review, distribution, >>> printing, copying or use of this e-mail is strictly prohibited. If >>> you have received this email in error, please notify the sender >>> immediately and delete the original message. >>> MIMOS Berhad is a research and development institution under the >>> purview of the Malaysian Ministry of Science, Technology and >>> Innovation. Opinions, conclusions and other information in this e- >>> mail that do not relate to the official business of MIMOS Berhad >>> and/or its subsidiaries shall be understood as neither given nor >>> endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS >>> Berhad nor its subsidiaries accepts responsibility for the same. All >>> liability arising from or in connection with computer viruses and/or >>> corrupted e-mails is excluded to the fullest extent permitted by law. >>> >>> >> >> ________________________________ >> DISCLAIMER: >> >> >> This e-mail (including any attachments) is for the addressee(s) only and may >> be confidential, especially as regards personal data. If you are not the >> intended recipient, please note that any dealing, review, distribution, >> printing, copying or use of this e-mail is strictly prohibited. If you have >> received this email in error, please notify the sender immediately and >> delete the original message (including any attachments). >> >> >> MIMOS Berhad is a research and development institution under the purview of >> the Malaysian Ministry of Science, Technology and Innovation. Opinions, >> conclusions and other information in this e-mail that do not relate to the >> official business of MIMOS Berhad and/or its subsidiaries shall be >> understood as neither given nor endorsed by MIMOS Berhad and/or its >> subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts >> responsibility for the same. All liability arising from or in connection >> with computer viruses and/or corrupted e-mails is excluded to the fullest >> extent permitted by law. >> >> >> ------------------------------------------------------------------ >> - >> - >> DISCLAIMER: >> >> This e-mail (including any attachments) is for the addressee(s) only >> and may contain confidential information. If you are not the intended >> recipient, please note that any dealing, review, distribution, >> printing, copying or use of this e-mail is strictly prohibited. If you >> have received this email in error, please notify the sender >> immediately and delete the original message. >> MIMOS Berhad is a research and development institution under the >> purview of the Malaysian Ministry of Science, Technology and >> Innovation. Opinions, conclusions and other information in this e- >> mail that do not relate to the official business of MIMOS Berhad >> and/or its subsidiaries shall be understood as neither given nor >> endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS >> Berhad nor its subsidiaries accepts responsibility for the same. All >> liability arising from or in connection with computer viruses and/or >> corrupted e-mails is excluded to the fullest extent permitted by law. >> > > ________________________________ > DISCLAIMER: > > > This e-mail (including any attachments) is for the addressee(s) only and may > be confidential, especially as regards personal data. If you are not the > intended recipient, please note that any dealing, review, distribution, > printing, copying or use of this e-mail is strictly prohibited. If you have > received this email in error, please notify the sender immediately and delete > the original message (including any attachments). > > > MIMOS Berhad is a research and development institution under the purview of > the Malaysian Ministry of Science, Technology and Innovation. Opinions, > conclusions and other information in this e-mail that do not relate to the > official business of MIMOS Berhad and/or its subsidiaries shall be understood > as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and > neither MIMOS Berhad nor its subsidiaries accepts responsibility for the > same. All liability arising from or in connection with computer viruses > and/or corrupted e-mails is excluded to the fullest extent permitted by law. > > ------------------------------------------------------------------ > - > - > DISCLAIMER: > > This e-mail (including any attachments) is for the addressee(s) > only and may contain confidential information. If you are not the > intended recipient, please note that any dealing, review, > distribution, printing, copying or use of this e-mail is strictly > prohibited. If you have received this email in error, please notify > the sender immediately and delete the original message. > MIMOS Berhad is a research and development institution under > the purview of the Malaysian Ministry of Science, Technology and > Innovation. Opinions, conclusions and other information in this e- > mail that do not relate to the official business of MIMOS Berhad > and/or its subsidiaries shall be understood as neither given nor > endorsed by MIMOS Berhad and/or its subsidiaries and neither > MIMOS Berhad nor its subsidiaries accepts responsibility for the > same. All liability arising from or in connection with computer > viruses and/or corrupted e-mails is excluded to the fullest extent > permitted by law. > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com