Re: [ceph-users] mds isn't working anymore after osd's running full

John Spray Wed, 15 Oct 2014 12:16:13 -0700

Sadly undump has been broken for quite some time (it was fixed in
giant as part of creating cephfs-journal-tool).  If there's a one line
fix for this then it's probably worth putting in firefly since it's a
long term supported branch -- I'll do that now.


John

On Wed, Oct 15, 2014 at 8:23 AM, Jasper Siero
<jasper.si...@target-holding.nl> wrote:
> Hello Greg,
>
> The dump and reset of the journal was succesful:
>
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
> /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
> --dump-journal 0 journaldumptgho-mon001
> journal is 9483323613~134215459
> read 134213311 bytes at offset 9483323613
> wrote 134213311 bytes at offset 9483323613 to journaldumptgho-mon001
> NOTE: this is a _sparse_ file; you can
>         $ tar cSzf journaldumptgho-mon001.tgz journaldumptgho-mon001
>       to efficiently compress it while preserving sparseness.
>
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
> /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
> --reset-journal 0
> old journal was 9483323613~134215459
> new journal start will be 9621733376 (4194304 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
>
>
> Undumping the journal was not successful and looking into the error 
> "client_lock.is_locked()" is showed several times. The mds is not running 
> when I start the undumping so maybe have forgot something?
>
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
> /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
> --undump-journal 0 journaldumptgho-mon001
> undump journaldumptgho-mon001
> start 9483323613 len 134213311
> writing header 200.00000000
> osdc/Objecter.cc: In function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' 
> thread 7fec3e5ad7a0 time 2014-10-15 09:09:32.020287
> osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x80f15e]
>  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  3: (main()+0x1632) [0x569c62]
>  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  5: /usr/bin/ceph-mds() [0x567d99]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
> 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In function 
> 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 
> 2014-10-15 09:09:32.020287
> osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
>
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x80f15e]
>  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  3: (main()+0x1632) [0x569c62]
>  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  5: /usr/bin/ceph-mds() [0x567d99]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
>      0> 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In 
> function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 
> time 2014-10-15 09:09:32.020287
> osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
>
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --p8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x80f15e]
>  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  3: (main()+0x1632) [0x569c62]
>  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  5: /usr/bin/ceph-mds() [0x567d99]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> *** Caught signal (Aborted) **
>  in thread 7fec3e5ad7a0
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x82ef61]
>  2: (()+0xf710) [0x7fec3d9a6710]
>  3: (gsignal()+0x35) [0x7fec3ca7c635]
>  4: (abort()+0x175) [0x7fec3ca7de15]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
>  6: (()+0xbcbe6) [0x7fec3d334be6]
>  7: (()+0xbcc13) [0x7fec3d334c13]
>  8: (()+0xbcd0e) [0x7fec3d334d0e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x7f2) [0x94b812]
>  10: /usr/bin/ceph-mds() [0x80f15e]
>  11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  12: (main()+0x1632) [0x569c62]
>  13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  14: /usr/bin/ceph-mds() [0x567d99]
> 2014-10-15 09:09:32.024248 7fec3e5ad7a0 -1 *** Caught signal (Aborted) **
>  in thread 7fec3e5ad7a0
>
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x82ef61]
>  2: (()+0xf710) [0x7fec3d9a6710]
>  3: (gsignal()+0x35) [0x7fec3ca7c635]
>  4: (abort()+0x175) [0x7fec3ca7de15]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
>  6: (()+0xbcbe6) [0x7fec3d334be6]
>  7: (()+0xbcc13) [0x7fec3d334c13]
>  8: (()+0xbcd0e) [0x7fec3d334d0e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x7f2) [0x94b812]
>  10: /usr/bin/ceph-mds() [0x80f15e]
>  11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  12: (main()+0x1632) [0x569c62]
>  13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  14: /usr/bin/ceph-mds() [0x567d99]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
>      0> 2014-10-15 09:09:32.024248 7fec3e5ad7a0 -1 *** Caught signal 
> (Aborted) **
>  in thread 7fec3e5ad7a0
>
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x82ef61]
>  2: (()+0xf710) [0x7fec3d9a6710]
>  3: (gsignal()+0x35) [0x7fec3ca7c635]
>  4: (abort()+0x175) [0x7fec3ca7de15]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
>  6: (()+0xbcbe6) [0x7fec3d334be6]
>  7: (()+0xbcc13) [0x7fec3d334c13]
>  8: (()+0xbcd0e) [0x7fec3d334d0e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x7f2) [0x94b812]
>  10: /usr/bin/ceph-mds() [0x80f15e]
>  11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  12: (main()+0x1632) [0x569c62]
>  13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  14: /usr/bin/ceph-mds() [0x567d99]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
> Aborted
>
> Jasper
> ________________________________________
> Van: Gregory Farnum [g...@inktank.com]
> Verzonden: dinsdag 14 oktober 2014 23:40
> Aan: Jasper Siero
> CC: ceph-users
> Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full
>
> ceph-mds --undump-journal <rank> <journal-file>
> Looks like it accidentally (or on purpose? you can break things with
> it) got left out of the help text.
>
> On Tue, Oct 14, 2014 at 8:19 AM, Jasper Siero
> <jasper.si...@target-holding.nl> wrote:
>> Hello Greg,
>>
>> I dumped the journal successful to a file:
>>
>> journal is 9483323613~134215459
>> read 134213311 bytes at offset 9483323613
>> wrote 134213311 bytes at offset 9483323613 to journaldumptgho
>> NOTE: this is a _sparse_ file; you can
>>         $ tar cSzf journaldumptgho.tgz journaldumptgho
>>       to efficiently compress it while preserving sparseness.
>>
>> I see the option for resetting the mds journal but I can't find the option 
>> for undumping /importing the journal:
>>
>>  usage: ceph-mds -i name [flags] [[--journal_check 
>> rank]|[--hot-standby][rank]]
>>   -m monitorip:port
>>         connect to monitor at given address
>>   --debug_mds n
>>         debug MDS level (e.g. 10)
>>   --dump-journal rank filename
>>         dump the MDS journal (binary) for rank.
>>   --dump-journal-entries rank filename
>>         dump the MDS journal (JSON) for rank.
>>   --journal-check rank
>>         replay the journal for rank, then exit
>>   --hot-standby rank
>>         start up as a hot standby for rank
>>   --reset-journal rank
>>         discard the MDS journal for rank, and replace it with a single
>>         event that updates/resets inotable and sessionmap on replay.
>>
>> Do you know how to "undump" the journal back into ceph?
>>
>> Jasper
>>
>> ________________________________________
>> Van: Gregory Farnum [g...@inktank.com]
>> Verzonden: vrijdag 10 oktober 2014 23:45
>> Aan: Jasper Siero
>> CC: ceph-users
>> Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
>> full
>>
>> Ugh, "debug journaler", not "debug journaled."
>>
>> That said, the filer output tells me that you're missing an object out
>> of the MDS log. (200.000008f5) I think this issue should be resolved
>> if you "dump" the journal to a file, "reset" it, and then "undump" it.
>> (These are commands you can invoke from ceph-mds.)
>> I haven't done this myself in a long time, so there may be some hard
>> edges around it. In particular, I'm not sure if the dumped journal
>> file will stop when the data stops, or if it will be a little too
>> long. If so, we can fix that by truncating the dumped file to the
>> proper length and resetting and undumping again.
>> (And just to harp on it, this journal manipulation is a lot simpler in
>> Giant... ;) )
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>> On Wed, Oct 8, 2014 at 7:11 AM, Jasper Siero
>> <jasper.si...@target-holding.nl> wrote:
>>> Hello Greg,
>>>
>>> No problem thanks for looking into the log. I attached the log to this 
>>> email.
>>> I'm looking forward for the new release because it would be nice to have 
>>> more possibilities to diagnose problems.
>>>
>>> Kind regards,
>>>
>>> Jasper Siero
>>> ________________________________________
>>> Van: Gregory Farnum [g...@inktank.com]
>>> Verzonden: dinsdag 7 oktober 2014 19:45
>>> Aan: Jasper Siero
>>> CC: ceph-users
>>> Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
>>> full
>>>
>>> Sorry; I guess this fell off my radar.
>>>
>>> The issue here is not that it's waiting for an osdmap; it got the
>>> requested map and went into replay mode almost immediately. In fact
>>> the log looks good except that it seems to finish replaying the log
>>> and then simply fail to transition into active. Generate a new one,
>>> adding in "debug journaled = 20" and "debug filer = 20", and we can
>>> probably figure out how to fix it.
>>> (This diagnosis is much easier in the upcoming Giant!)
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>> On Tue, Oct 7, 2014 at 7:55 AM, Jasper Siero
>>> <jasper.si...@target-holding.nl> wrote:
>>>> Hello Gregory,
>>>>
>>>> We still have the same problems with our test ceph cluster and didn't 
>>>> receive a reply from you after I send you the requested log files. Do you 
>>>> know if it's possible to get our cephfs filesystem working again or is it 
>>>> better to give up the files on cephfs and start over again?
>>>>
>>>> We restarted the cluster serveral times but it's still degraded:
>>>> [root@th1-mon001 ~]# ceph -w
>>>>     cluster c78209f5-55ea-4c70-8968-2231d2b05560
>>>>      health HEALTH_WARN mds cluster is degraded
>>>>      monmap e3: 3 mons at 
>>>> {th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0},
>>>>  election epoch 432, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003
>>>>      mdsmap e190: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby
>>>>      osdmap e2248: 12 osds: 12 up, 12 in
>>>>       pgmap v197548: 492 pgs, 4 pools, 60297 MB data, 470 kobjects
>>>>             124 GB used, 175 GB / 299 GB avail
>>>>                  491 active+clean
>>>>                    1 active+clean+scrubbing+deep
>>>>
>>>> One placement group stays in the deep scrubbing fase.
>>>>
>>>> Kind regards,
>>>>
>>>> Jasper Siero
>>>>
>>>>
>>>> ________________________________________
>>>> Van: Jasper Siero
>>>> Verzonden: donderdag 21 augustus 2014 16:43
>>>> Aan: Gregory Farnum
>>>> Onderwerp: RE: [ceph-users] mds isn't working anymore after osd's running 
>>>> full
>>>>
>>>> I did restart it but you are right about the epoch number which has 
>>>> changed but the situation looks the same.
>>>> 2014-08-21 16:33:06.032366 7f9b5f3cd700  1 mds.0.27  need osdmap epoch 
>>>> 1994, have 1993
>>>> 2014-08-21 16:33:06.032368 7f9b5f3cd700  1 mds.0.27  waiting for osdmap 
>>>> 1994 (which blacklists
>>>> prior instance)
>>>> I started the mds with the debug options and attached the log.
>>>>
>>>> Thanks,
>>>>
>>>> Jasper
>>>> ________________________________________
>>>> Van: Gregory Farnum [g...@inktank.com]
>>>> Verzonden: woensdag 20 augustus 2014 18:38
>>>> Aan: Jasper Siero
>>>> CC: ceph-users@lists.ceph.com
>>>> Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
>>>> full
>>>>
>>>> After restarting your MDS, it still says it has epoch 1832 and needs
>>>> epoch 1833? I think you didn't really restart it.
>>>> If the epoch numbers have changed, can you restart it with "debug mds
>>>> = 20", "debug objecter = 20", "debug ms = 1" in the ceph.conf and post
>>>> the resulting log file somewhere?
>>>> -Greg
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>
>>>>
>>>> On Wed, Aug 20, 2014 at 12:49 AM, Jasper Siero
>>>> <jasper.si...@target-holding.nl> wrote:
>>>>> Unfortunately that doesn't help. I restarted both the active and standby 
>>>>> mds but that doesn't change the state of the mds. Is there a way to force 
>>>>> the mds to look at the 1832 epoch (or earlier) instead of 1833 (need 
>>>>> osdmap epoch 1833, have 1832)?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jasper
>>>>> ________________________________________
>>>>> Van: Gregory Farnum [g...@inktank.com]
>>>>> Verzonden: dinsdag 19 augustus 2014 19:49
>>>>> Aan: Jasper Siero
>>>>> CC: ceph-users@lists.ceph.com
>>>>> Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
>>>>> full
>>>>>
>>>>> On Mon, Aug 18, 2014 at 6:56 AM, Jasper Siero
>>>>> <jasper.si...@target-holding.nl> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We have a small ceph cluster running version 0.80.1 with cephfs on five
>>>>>> nodes.
>>>>>> Last week some osd's were full and shut itself down. To help de osd's 
>>>>>> start
>>>>>> again I added some extra osd's and moved some placement group 
>>>>>> directories on
>>>>>> the full osd's (which has a copy on another osd) to another place on the
>>>>>> node (as mentioned in
>>>>>> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/)
>>>>>> After clearing some space on the full osd's I started them again. After a
>>>>>> lot of deep scrubbing and two pg inconsistencies which needed to be 
>>>>>> repaired
>>>>>> everything looked fine except the mds which still is in the replay state 
>>>>>> and
>>>>>> it stays that way.
>>>>>> The log below says that mds need osdmap epoch 1833 and have 1832.
>>>>>>
>>>>>> 2014-08-18 12:29:22.268248 7fa786182700  1 mds.-1.0 handle_mds_map 
>>>>>> standby
>>>>>> 2014-08-18 12:29:22.273995 7fa786182700  1 mds.0.25 handle_mds_map i am 
>>>>>> now
>>>>>> mds.0.25
>>>>>> 2014-08-18 12:29:22.273998 7fa786182700  1 mds.0.25 handle_mds_map state
>>>>>> change up:standby --> up:replay
>>>>>> 2014-08-18 12:29:22.274000 7fa786182700  1 mds.0.25 replay_start
>>>>>> 2014-08-18 12:29:22.274014 7fa786182700  1 mds.0.25  recovery set is
>>>>>> 2014-08-18 12:29:22.274016 7fa786182700  1 mds.0.25  need osdmap epoch 
>>>>>> 1833,
>>>>>> have 1832
>>>>>> 2014-08-18 12:29:22.274017 7fa786182700  1 mds.0.25  waiting for osdmap 
>>>>>> 1833
>>>>>> (which blacklists prior instance)
>>>>>>
>>>>>>  # ceph status
>>>>>>     cluster c78209f5-55ea-4c70-8968-2231d2b05560
>>>>>>      health HEALTH_WARN mds cluster is degraded
>>>>>>      monmap e3: 3 mons at
>>>>>> {th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0},
>>>>>> election epoch 362, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003
>>>>>>      mdsmap e154: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby
>>>>>>      osdmap e1951: 12 osds: 12 up, 12 in
>>>>>>       pgmap v193685: 492 pgs, 4 pools, 60297 MB data, 470 kobjects
>>>>>>             124 GB used, 175 GB / 299 GB avail
>>>>>>                  492 active+clean
>>>>>>
>>>>>> # ceph osd tree
>>>>>> # id    weight    type name    up/down    reweight
>>>>>> -1    0.2399    root default
>>>>>> -2    0.05997        host th1-osd001
>>>>>> 0    0.01999            osd.0    up    1
>>>>>> 1    0.01999            osd.1    up    1
>>>>>> 2    0.01999            osd.2    up    1
>>>>>> -3    0.05997        host th1-osd002
>>>>>> 3    0.01999            osd.3    up    1
>>>>>> 4    0.01999            osd.4    up    1
>>>>>> 5    0.01999            osd.5    up    1
>>>>>> -4    0.05997        host th1-mon003
>>>>>> 6    0.01999            osd.6    up    1
>>>>>> 7    0.01999            osd.7    up    1
>>>>>> 8    0.01999            osd.8    up    1
>>>>>> -5    0.05997        host th1-mon002
>>>>>> 9    0.01999            osd.9    up    1
>>>>>> 10    0.01999            osd.10    up    1
>>>>>> 11    0.01999            osd.11    up    1
>>>>>>
>>>>>> What is the way to get the mds up and running again?
>>>>>>
>>>>>> I still have all the placement group directories which I moved from the 
>>>>>> full
>>>>>> osds which where down to create disk space.
>>>>>
>>>>> Try just restarting the MDS daemon. This sounds a little familiar so I
>>>>> think it's a known bug which may be fixed in a later dev or point
>>>>> release on the MDS, but it's a soft-state rather than a disk state
>>>>> issue.
>>>>> -Greg
>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds isn't working anymore after osd's running full

Reply via email to