On Wed, Sep 27, 2017 at 1:18 PM, Richard Hesketh
<richard.hesk...@rd.bbc.co.uk> wrote:
> On 27/09/17 12:32, John Spray wrote:
>> On Wed, Sep 27, 2017 at 12:15 PM, Richard Hesketh
>> <richard.hesk...@rd.bbc.co.uk> wrote:
>>> As the subject says... any ceph fs administrative command I try to run 
>>> hangs forever and kills monitors in the background - sometimes they come 
>>> back, on a couple of occasions I had to manually stop/restart a suffering 
>>> mon. Trying to load the filesystem tab in the ceph-mgr dashboard dumps an 
>>> error and can also kill a monitor. However, clients can mount the 
>>> filesystem and read/write data without issue.
>>>
>>> Relevant excerpt from logs on an affected monitor, just trying to run 'ceph 
>>> fs ls':
>>>
>>> 2017-09-26 13:20:50.716087 7fc85fdd9700  0 mon.vm-ds-01@0(leader) e19 
>>> handle_command mon_command({"prefix": "fs ls"} v 0) v1
>>> 2017-09-26 13:20:50.727612 7fc85fdd9700  0 log_channel(audit) log [DBG] : 
>>> from='client.? 10.10.10.1:0/2771553898' entity='client.admin' 
>>> cmd=[{"prefix": "fs ls"}]: dispatch
>>> 2017-09-26 13:20:50.950373 7fc85fdd9700 -1 
>>> /build/ceph-12.2.0/src/osd/OSDMap.h: In function 'const string& 
>>> OSDMap::get_pool_name(int64_t) const' thread 7fc85fdd9700 time 2017-09-26 
>>> 13:20:50.727676
>>> /build/ceph-12.2.0/src/osd/OSDMap.h: 1176: FAILED assert(i != 
>>> pool_name.end())
>>>
>>>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous 
>>> (rc)
>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>>> const*)+0x102) [0x55a8ca0bb642]
>>>  2: (()+0x48165f) [0x55a8c9f4165f]
>>>  3: 
>>> (MDSMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0x1d18) 
>>> [0x55a8ca047688]
>>>  4: 
>>> (MDSMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x2a8) 
>>> [0x55a8ca048008]
>>>  5: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x700) 
>>> [0x55a8c9f9d1b0]
>>>  6: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1f93) 
>>> [0x55a8c9e63193]
>>>  7: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xa0e) 
>>> [0x55a8c9e6a52e]
>>>  8: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55a8c9e6b57b]
>>>  9: (Monitor::ms_dispatch(Message*)+0x23) [0x55a8c9e9a053]
>>>  10: (DispatchQueue::entry()+0xf4a) [0x55a8ca3b5f7a]
>>>  11: (DispatchQueue::DispatchThread::entry()+0xd) [0x55a8ca16bc1d]
>>>  12: (()+0x76ba) [0x7fc86b3ac6ba]
>>>  13: (clone()+0x6d) [0x7fc869bd63dd]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed 
>>> to interpret this.
>>>
>>> I'm running Luminous. The cluster and FS have been in service since Hammer 
>>> and have default data/metadata pool names. I discovered the issue after 
>>> attempting to enable directory sharding.
>>
>> Well that's not good...
>>
>> The assertion is because your FSMap is referring to a pool that
>> apparently no longer exists in the OSDMap.  This should be impossible
>> in current Ceph (we forbid removing pools if they're in use), but
>> could perhaps have been caused in an earlier version of Ceph when it
>> was possible to remove a pool even if CephFS was referring to it?
>>
>> Alternatively, perhaps something more severe is going on that's
>> causing your mons to see a wrong/inconsistent view of the world.  Has
>> the cluster ever been through any traumatic disaster recovery type
>> activity involving hand-editing any of the cluster maps?  What
>> intermediate versions has it passed through on the way from Hammer to
>> Luminous?
>>
>> Opened a ticket here: http://tracker.ceph.com/issues/21568
>>
>> John
>
> I've reviewed my notes (i.e. I've grepped my IRC logs); I actually inherited 
> this cluster from a colleague who left shortly after I joined, so 
> unfortunately there is some of its history I cannot fill in.
>
> Turns out the cluster actually predates Firefly. Looking at dates my 
> suspicion is that it went Emperor -> Firefly -> Giant -> Hammer. I inherited 
> it at Hammer, and took it Hammer -> Infernalis -> Jewel -> Luminous myself. I 
> know I did make sure to do the tmap_upgrade step on cephfs but can't remember 
> if I did it at Infernalis or Jewel.
>
> Infernalis was a tricky upgrade; the attempt was aborted once after the first 
> set of OSDs didn't come back up after upgrade (had to remove/downgrade and 
> readd), and setting sortbitwise as the documentation suggested after a 
> successful second attempt caused everything to break and degrade slowly until 
> it was unset and recovered. Never had disaster recovery involve mucking 
> around with the pools while I was administrating it, but unfortunately I 
> cannot speak for the cluster's pre-Hammer history. The only pools I have 
> removed were ones I created temporarily for testing crush rules/benchmarking.

OK, so it sounds like a cluster with an interesting history and some
stories to tell :-)

> I have hand-edited the crush map (extract, decompile, modify, recompile, 
> inject) at times because I found it more convenient for creating new crush 
> rules than using the CLI tools, but not the OSD map.
>
> Why would the cephfs have been referring to other pools?

A filesystem can have more than one data pool, they're added at
runtime with add_data_pool/rm_data_pool commands.  In old versions of
the code, someone could add a data pool, then delete the pool, and
forget to do rm_data_pool.

So, next step is to try and actually get the FSMap out of the
monitor's store to see if that's really what's happening --
unfortunately when checking how to do that I realise we missed
updating the human readable output of ceph-monstore-tool when adding
multi-filesystem support... so here's how to get it out in binary form
and then decode it separately:

ceph-monstore-tool /var/lib/ceph/<wherever...> get mdsmap > fsmap.bin
ceph-dencoder import fsmap.bin type FSMap decode dump_json

If the simple theory is correct then you'll see something referenced
in one of the pool/pools fields that doesn't really exist.

John

>
> Rich
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to