Can you try a 13.2.2 mgr? Paul
-- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Jan 7, 2019 at 11:52 PM Randall Smith <rbsm...@adams.edu> wrote: > > More follow up because, obviously, this is a weird problem. I was able to > start up a luminous mgr and successfully join my 13.2.4 cluster. I still > can't get a 13.2.4 mgr to join. I still get the same error I've had before. > (See previously in the thread.) > > It definitely seems like something is screwy with the mimic mgr. > > On Mon, Jan 7, 2019 at 9:57 AM Randall Smith <rbsm...@adams.edu> wrote: >> >> I upgraded to 13.2.4 and, unsurprisingly, it did not solve the problem. >> ceph-mgr still fails. What else do I need to look at to try to solve this? >> >> Thanks. >> >> On Fri, Jan 4, 2019 at 3:20 PM Randall Smith <rbsm...@adams.edu> wrote: >>> >>> Some more info that may or may not matter. :-) First off, I am running >>> 13.2.3 on Ubuntu Xenial (ceph version 13.2.3 >>> (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic (stable)). >>> >>> Next, when I try running ceph-mgr with --no-mon-config, the app core dumps. >>> >>> 0> 2019-01-04 14:56:56.416 7fbcc71db380 -1 >>> /build/ceph-13.2.3/src/common/Timer.cc: In function 'virtual >>> SafeTimer::~SafeTimer()' thread 7fbcc71db380 time 2019-01-04 14:56:56.419012 >>> /build/ceph-13.2.3/src/common/Timer.cc: 50: FAILED assert(thread == __null) >>> >>> ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic >>> (stable) >>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x102) [0x7fbcbe5093c2] >>> 2: (()+0x2e5587) [0x7fbcbe509587] >>> 3: (()+0x2e12de) [0x7fbcbe5052de] >>> 4: (MgrClient::~MgrClient()+0xc4) [0x5594f4] >>> 5: (MgrStandby::~MgrStandby()+0x14d) [0x55063d] >>> 6: (main()+0x24b) [0x49446b] >>> 7: (__libc_start_main()+0xf0) [0x7fbcbcf51830] >>> 8: (_start()+0x29) [0x497dc9] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >>> to interpret this. >>> >>> --- logging levels --- >>> 0/ 5 none >>> 0/ 1 lockdep >>> 0/ 1 context >>> 1/ 1 crush >>> 1/ 5 mds >>> 1/ 5 mds_balancer >>> 1/ 5 mds_locker >>> 1/ 5 mds_log >>> 1/ 5 mds_log_expire >>> 1/ 5 mds_migrator >>> 0/ 1 buffer >>> 0/ 1 timer >>> 0/ 1 filer >>> 0/ 1 striper >>> 0/ 1 objecter >>> 0/ 5 rados >>> 0/ 5 rbd >>> 0/ 5 rbd_mirror >>> 0/ 5 rbd_replay >>> 0/ 5 journaler >>> 0/ 5 objectcacher >>> 0/ 5 client >>> 1/ 5 osd >>> 0/ 5 optracker >>> 0/ 5 objclass >>> 1/ 3 filestore >>> 1/ 3 journal >>> 10/10 ms >>> 1/ 5 mon >>> 0/10 monc >>> 1/ 5 paxos >>> 0/ 5 tp >>> 1/ 5 auth >>> 1/ 5 crypto >>> 1/ 1 finisher >>> 1/ 1 reserver >>> 1/ 5 heartbeatmap >>> 1/ 5 perfcounter >>> 1/ 5 rgw >>> 1/ 5 rgw_sync >>> 1/10 civetweb >>> 1/ 5 javaclient >>> 1/ 5 asok >>> 1/ 1 throttle >>> 0/ 0 refs >>> 1/ 5 xio >>> 1/ 5 compressor >>> 1/ 5 bluestore >>> 1/ 5 bluefs >>> 1/ 3 bdev >>> 1/ 5 kstore >>> 4/ 5 rocksdb >>> 4/ 5 leveldb >>> 4/ 5 memdb >>> 1/ 5 kinetic >>> 1/ 5 fuse >>> 1/ 5 mgr >>> 1/ 5 mgrc >>> 1/ 5 dpdk >>> 1/ 5 eventtrace >>> -2/-2 (syslog threshold) >>> 99/99 (stderr threshold) >>> max_recent 10000 >>> max_new 1000 >>> log_file >>> --- end dump of recent events --- >>> *** Caught signal (Aborted) ** >>> in thread 7fbcc71db380 thread_name:ceph-mgr >>> ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic >>> (stable) >>> 1: /usr/bin/ceph-mgr() [0x63ebd0] >>> 2: (()+0x11390) [0x7fbcbd819390] >>> 3: (gsignal()+0x38) [0x7fbcbcf66428] >>> 4: (abort()+0x16a) [0x7fbcbcf6802a] >>> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x250) [0x7fbcbe509510] >>> 6: (()+0x2e5587) [0x7fbcbe509587] >>> 7: (()+0x2e12de) [0x7fbcbe5052de] >>> 8: (MgrClient::~MgrClient()+0xc4) [0x5594f4] >>> 9: (MgrStandby::~MgrStandby()+0x14d) [0x55063d] >>> 10: (main()+0x24b) [0x49446b] >>> 11: (__libc_start_main()+0xf0) [0x7fbcbcf51830] >>> 12: (_start()+0x29) [0x497dc9] >>> 2019-01-04 14:56:56.420 7fbcc71db380 -1 *** Caught signal (Aborted) ** >>> in thread 7fbcc71db380 thread_name:ceph-mgr >>> >>> ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic >>> (stable) >>> 1: /usr/bin/ceph-mgr() [0x63ebd0] >>> 2: (()+0x11390) [0x7fbcbd819390] >>> 3: (gsignal()+0x38) [0x7fbcbcf66428] >>> 4: (abort()+0x16a) [0x7fbcbcf6802a] >>> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x250) [0x7fbcbe509510] >>> 6: (()+0x2e5587) [0x7fbcbe509587] >>> 7: (()+0x2e12de) [0x7fbcbe5052de] >>> 8: (MgrClient::~MgrClient()+0xc4) [0x5594f4] >>> 9: (MgrStandby::~MgrStandby()+0x14d) [0x55063d] >>> 10: (main()+0x24b) [0x49446b] >>> 11: (__libc_start_main()+0xf0) [0x7fbcbcf51830] >>> 12: (_start()+0x29) [0x497dc9] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >>> to interpret this. >>> >>> >>> On Fri, Jan 4, 2019 at 1:53 PM Randall Smith <rbsm...@adams.edu> wrote: >>>> >>>> I think this is the relevant section of the debug log. There's no >>>> AUTH_NONE error which would make things easy. You can see the same >>>> "Invalid argument" error that I'm seeing in the mgr debug output. The >>>> malformed request feels like a compatibility or protocol communication >>>> issue. >>>> >>>> 2019-01-04 13:41:58.972 7f88950f5700 10 mon.07@1(peon) e27 >>>> ms_verify_authorizer 192.168.253.148:0/3301807723 client protocol 0 >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon) e27 _ms_dispatch >>>> new session 0x40a58c0 MonSession(client.? 192.168.253.148:0/3301807723 is >>>> open , features 0x3ffddff8ffa4fffb (luminous)) fea$ures 0x3ffddff8ffa4fffb >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697 >>>> preprocess_query auth(proto 0 26 bytes epoch 0) v1 from client.? >>>> 192.168.253.148:0/3301807723 >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697 >>>> prep_auth() blob_size=26 >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697 >>>> AuthMonitor::assign_global_id m=auth(proto 0 26 bytes epoch 0) v1 mon=1/3 >>>> last_allocated=12307825 max_global_id=12353896 >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697 >>>> next_global_id should be 12307828 >>>> 2019-01-04 13:41:58.972 7f8890143700 2 mon.07@1(peon) e27 send_reply >>>> 0x5449180 0x4ee1c00 auth_reply(proto 2 0 (0) Success) v1 >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697 >>>> preprocess_query auth(proto 2 2 bytes epoch 0) v1 from client.? >>>> 192.168.253.148:0/3301807723 >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697 >>>> prep_auth() blob_size=2 >>>> 2019-01-04 13:41:58.972 7f8890143700 0 mon.07@1(peon).auth v87697 caught >>>> error when trying to handle auth request, probably malformed request >>>> 2019-01-04 13:41:58.972 7f8890143700 2 mon.07@1(peon) e27 send_reply >>>> 0x30dc500 0x5caa280 auth_reply(proto 2 -22 (22) Invalid argument) v1 >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon) e27 ms_handle_reset >>>> 0x4102a00 192.168.253.148:0/3301807723 >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon) e27 reset/close on >>>> session client.? 192.168.253.148:0/3301807723 >>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon) e27 remove_session >>>> 0x40a58c0 client.? 192.168.253.148:0/3301807723 features 0x3ffddff8ffa4fffb >>>> >>>> On Fri, Jan 4, 2019 at 12:32 PM Gregory Farnum <gfar...@redhat.com> wrote: >>>>> >>>>> You can also get more data by checking what the monitor logs for that >>>>> manager on the connect attempt (if you turn up its debug mon or debug >>>>> ms settings). If one of your managers is behaving, I'd examine its >>>>> configuration file and compare to the others. For instance, that >>>>> "Invalid argument" might mean the manager is trying to use "AUTH_NONE" >>>>> (no CephX) and the monitors aren't allowing that. >>>>> -Greg >>>>> >>>>> On Fri, Jan 4, 2019 at 6:26 AM Randall Smith <rbsm...@adams.edu> wrote: >>>>> > >>>>> > Greetings, >>>>> > >>>>> > I'm upgrading my cluster from luminous to mimic. I've upgraded my >>>>> > monitors and am attempting to upgrade the mgrs. Unfortunately, after an >>>>> > upgrade the mgr daemon exits immediately with error code 1. >>>>> > >>>>> > I've tried running ceph-mgr in debug mode to try to see what's >>>>> > happening but the output (below) is a bit cryptic for me. It looks like >>>>> > authentication might be failing but it was working prior to the upgrade. >>>>> > >>>>> > I do have "auth supported = cephx" in the global section of ceph.conf. >>>>> > >>>>> > What do I need to do to fix this? >>>>> > >>>>> > Thanks. >>>>> > >>>>> > /usr/bin/ceph-mgr -f --cluster ceph --id 8 --setuser ceph --setgroup >>>>> > ceph -d --debug_ms 5 >>>>> > 2019-01-04 07:01:38.457 7f808f83f700 2 Event(0x30c42c0 nevent=5000 >>>>> > time_id=1).set_owner idx=0 owner=140190140331776 >>>>> > 2019-01-04 07:01:38.457 7f808f03e700 2 Event(0x30c4500 nevent=5000 >>>>> > time_id=1).set_owner idx=1 owner=140190131939072 >>>>> > 2019-01-04 07:01:38.457 7f808e83d700 2 Event(0x30c4e00 nevent=5000 >>>>> > time_id=1).set_owner idx=2 owner=140190123546368 >>>>> > 2019-01-04 07:01:38.457 7f809dd5b380 1 Processor -- start >>>>> > 2019-01-04 07:01:38.477 7f809dd5b380 1 -- - start start >>>>> > 2019-01-04 07:01:38.481 7f809dd5b380 1 -- - --> 192.168.253.147:6789/0 >>>>> > -- auth(proto 0 26 bytes epoch 0) v1 -- 0x32a6780 con 0 >>>>> > 2019-01-04 07:01:38.481 7f809dd5b380 1 -- - --> 192.168.253.148:6789/0 >>>>> > -- auth(proto 0 26 bytes epoch 0) v1 -- 0x32a6a00 con 0 >>>>> > 2019-01-04 07:01:38.481 7f808e83d700 1 -- 192.168.253.148:0/1359135487 >>>>> > learned_addr learned my addr 192.168.253.148:0/1359135487 >>>>> > 2019-01-04 07:01:38.481 7f808e83d700 2 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.148:6789/0 conn(0x332d500 :-1 >>>>> > s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got >>>>> > newly_a$ >>>>> > ked_seq 0 vs out_seq 0 >>>>> > 2019-01-04 07:01:38.481 7f808f03e700 2 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 >>>>> > s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got >>>>> > newly_a$ >>>>> > ked_seq 0 vs out_seq 0 >>>>> > 2019-01-04 07:01:38.481 7f808f03e700 5 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 >>>>> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx >>>>> > mon.1 seq >>>>> > 1 0x30c5440 mon_map magic: 0 v1 >>>>> > 2019-01-04 07:01:38.481 7f808e83d700 5 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.148:6789/0 conn(0x332d500 :-1 >>>>> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx >>>>> > mon.2 seq >>>>> > 1 0x30c5680 mon_map magic: 0 v1 >>>>> > 2019-01-04 07:01:38.481 7f808f03e700 5 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 >>>>> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx >>>>> > mon.1 seq >>>>> > 2 0x32a6780 auth_reply(proto 2 0 (0) Success) v1 >>>>> > 2019-01-04 07:01:38.481 7f808e83d700 5 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.148:6789/0 conn(0x332d500 :-1 >>>>> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx >>>>> > mon.2 seq >>>>> > 2 0x32a6a00 auth_reply(proto 2 0 (0) Success) v1 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > <== mon.1 192.168.253.147:6789/0 1 ==== mon_map magic: 0 v1 ==== >>>>> > 370+0+0 (3034216899 0 0) 0x30c5440 con 0x332ce00 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > <== mon.2 192.168.253.148:6789/0 1 ==== mon_map magic: 0 v1 ==== >>>>> > 370+0+0 (3034216899 0 0) 0x30c5680 con 0x332d500 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > <== mon.1 192.168.253.147:6789/0 2 ==== auth_reply(proto 2 0 (0) >>>>> > Success) v1 ==== 33+0+0 (3430158761 0 0) 0x32a6780 con 0x33$ >>>>> > ce00 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > --> 192.168.253.147:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- >>>>> > 0x32a6f00 con 0 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > <== mon.2 192.168.253.148:6789/0 2 ==== auth_reply(proto 2 0 (0) >>>>> > Success) v1 ==== 33+0+0 (3242503871 0 0) 0x32a6a00 con 0x33$ >>>>> > d500 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > --> 192.168.253.148:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- >>>>> > 0x32a6780 con 0 >>>>> > 2019-01-04 07:01:38.481 7f808f03e700 5 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 >>>>> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx >>>>> > mon.1 seq >>>>> > 3 0x32a6f00 auth_reply(proto 2 -22 (22) Invalid argument) v1 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > <== mon.1 192.168.253.147:6789/0 3 ==== auth_reply(proto 2 -22 (22) >>>>> > Invalid argument) v1 ==== 24+0+0 (882932531 0 0) 0x32a6f$ >>>>> > 0 con 0x332ce00 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN pgs=74172 >>>>> > cs=1 l=1).mark_down >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 2 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN pgs=74172 >>>>> > cs=1 l=1)._stop >>>>> > 2019-01-04 07:01:38.481 7f808e83d700 5 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.148:6789/0 conn(0x332d500 :-1 >>>>> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx >>>>> > mon.2 seq >>>>> > 3 0x32a6780 auth_reply(proto 2 -22 (22) Invalid argument) v1 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > <== mon.2 192.168.253.148:6789/0 3 ==== auth_reply(proto 2 -22 (22) >>>>> > Invalid argument) v1 ==== 24+0+0 (1359424806 0 0) 0x32a6$ >>>>> > 80 con 0x332d500 >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN pgs=74275 >>>>> > cs=1 l=1).mark_down >>>>> > 2019-01-04 07:01:38.481 7f808e03c700 2 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN pgs=74275 >>>>> > cs=1 l=1)._stop >>>>> > >>>>> > 2019-01-04 07:01:38.481 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 >>>>> > shutdown_connections >>>>> > 2019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 >>>>> > shutdown_connections mark down 192.168.253.148:6789/0 0x332d500 >>>>> > 2019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 >>>>> > shutdown_connections mark down 192.168.253.147:6789/0 0x332ce00 >>>>> > 2019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 >>>>> > shutdown_connections delete 0x332ce00 >>>>> > 2019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 >>>>> > shutdown_connections delete 0x332d500 >>>>> > 2019-01-04 07:01:38.485 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 >>>>> > shutdown_connections >>>>> > 2019-01-04 07:01:38.485 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 >>>>> > wait complete. >>>>> > 2019-01-04 07:01:38.485 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.148:0/1359135487 conn(0x332c000 :-1 s=STATE_NONE pgs=0 >>>>> > cs=0 l=0).mark_down >>>>> > 2019-01-04 07:01:38.485 7f809dd5b380 2 -- 192.168.253.148:0/1359135487 >>>>> > >> 192.168.253.148:0/1359135487 conn(0x332c000 :-1 s=STATE_NONE pgs=0 >>>>> > cs=0 l=0)._stop >>>>> > failed to fetch mon config (--no-mon-config to skip) >>>>> > >>>>> > -- >>>>> > Randall Smith >>>>> > Computing Services >>>>> > Adams State University >>>>> > http://www.adams.edu/ >>>>> > 719-587-7741 >>>>> > _______________________________________________ >>>>> > ceph-users mailing list >>>>> > ceph-users@lists.ceph.com >>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>>> >>>> -- >>>> Randall Smith >>>> Computing Services >>>> Adams State University >>>> http://www.adams.edu/ >>>> 719-587-7741 >>> >>> >>> >>> -- >>> Randall Smith >>> Computing Services >>> Adams State University >>> http://www.adams.edu/ >>> 719-587-7741 >> >> >> >> -- >> Randall Smith >> Computing Services >> Adams State University >> http://www.adams.edu/ >> 719-587-7741 > > > > -- > Randall Smith > Computing Services > Adams State University > http://www.adams.edu/ > 719-587-7741 > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com