More follow up because, obviously, this is a weird problem. I was able to
start up a luminous mgr and successfully join my 13.2.4 cluster. I still
can't get a 13.2.4 mgr to join. I still get the same error I've had before.
(See previously in the thread.)

It definitely seems like something is screwy with the mimic mgr.

On Mon, Jan 7, 2019 at 9:57 AM Randall Smith <rbsm...@adams.edu> wrote:

> I upgraded to 13.2.4 and, unsurprisingly, it did not solve the problem.
> ceph-mgr still fails. What else do I need to look at to try to solve this?
>
> Thanks.
>
> On Fri, Jan 4, 2019 at 3:20 PM Randall Smith <rbsm...@adams.edu> wrote:
>
>> Some more info that may or may not matter. :-) First off, I am running
>> 13.2.3 on Ubuntu Xenial (ceph version 13.2.3
>> (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic (stable)).
>>
>> Next, when I try running ceph-mgr with --no-mon-config, the app core
>> dumps.
>>
>>      0> 2019-01-04 14:56:56.416 7fbcc71db380 -1
>> /build/ceph-13.2.3/src/common/Timer.cc: In function 'virtual
>> SafeTimer::~SafeTimer()' thread 7fbcc71db380 time 2019-01-04 14:56:56.419012
>> /build/ceph-13.2.3/src/common/Timer.cc: 50: FAILED assert(thread ==
>> __null)
>>
>>  ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic
>> (stable)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x7fbcbe5093c2]
>>  2: (()+0x2e5587) [0x7fbcbe509587]
>>  3: (()+0x2e12de) [0x7fbcbe5052de]
>>  4: (MgrClient::~MgrClient()+0xc4) [0x5594f4]
>>  5: (MgrStandby::~MgrStandby()+0x14d) [0x55063d]
>>  6: (main()+0x24b) [0x49446b]
>>
>>  7: (__libc_start_main()+0xf0) [0x7fbcbcf51830]
>>  8: (_start()+0x29) [0x497dc9]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>> --- logging levels ---
>>
>>    0/ 5 none
>>    0/ 1 lockdep
>>    0/ 1 context
>>    1/ 1 crush
>>    1/ 5 mds
>>    1/ 5 mds_balancer
>>
>>
>>    1/ 5 mds_locker
>>
>>    1/ 5 mds_log
>>
>>    1/ 5 mds_log_expire
>>    1/ 5 mds_migrator
>>    0/ 1 buffer
>>    0/ 1 timer
>>    0/ 1 filer
>>    0/ 1 striper
>>    0/ 1 objecter
>>    0/ 5 rados
>>    0/ 5 rbd
>>    0/ 5 rbd_mirror
>>    0/ 5 rbd_replay
>>    0/ 5 journaler
>>    0/ 5 objectcacher
>>    0/ 5 client
>>    1/ 5 osd
>>    0/ 5 optracker
>>    0/ 5 objclass
>>    1/ 3 filestore
>>    1/ 3 journal
>>   10/10 ms
>>    1/ 5 mon
>>    0/10 monc
>>    1/ 5 paxos
>>    0/ 5 tp
>>    1/ 5 auth
>>    1/ 5 crypto
>>    1/ 1 finisher
>>    1/ 1 reserver
>>    1/ 5 heartbeatmap
>>    1/ 5 perfcounter
>>    1/ 5 rgw
>>    1/ 5 rgw_sync
>>    1/10 civetweb
>>    1/ 5 javaclient
>>    1/ 5 asok
>>    1/ 1 throttle
>>    0/ 0 refs
>>    1/ 5 xio
>>    1/ 5 compressor
>>    1/ 5 bluestore
>>    1/ 5 bluefs
>>    1/ 3 bdev
>>    1/ 5 kstore
>>    4/ 5 rocksdb
>>    4/ 5 leveldb
>>    4/ 5 memdb
>>    1/ 5 kinetic
>>    1/ 5 fuse
>>    1/ 5 mgr
>>    1/ 5 mgrc
>>    1/ 5 dpdk
>>    1/ 5 eventtrace
>>   -2/-2 (syslog threshold)
>>   99/99 (stderr threshold)
>>   max_recent     10000
>>   max_new         1000
>>   log_file
>> --- end dump of recent events ---
>> *** Caught signal (Aborted) **
>>  in thread 7fbcc71db380 thread_name:ceph-mgr
>>  ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic
>> (stable)
>>  1: /usr/bin/ceph-mgr() [0x63ebd0]
>>  2: (()+0x11390) [0x7fbcbd819390]
>>  3: (gsignal()+0x38) [0x7fbcbcf66428]
>>  4: (abort()+0x16a) [0x7fbcbcf6802a]
>>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x250) [0x7fbcbe509510]
>>  6: (()+0x2e5587) [0x7fbcbe509587]
>>  7: (()+0x2e12de) [0x7fbcbe5052de]
>>  8: (MgrClient::~MgrClient()+0xc4) [0x5594f4]
>>  9: (MgrStandby::~MgrStandby()+0x14d) [0x55063d]
>>  10: (main()+0x24b) [0x49446b]
>>  11: (__libc_start_main()+0xf0) [0x7fbcbcf51830]
>>  12: (_start()+0x29) [0x497dc9]
>> 2019-01-04 14:56:56.420 7fbcc71db380 -1 *** Caught signal (Aborted) **
>>  in thread 7fbcc71db380 thread_name:ceph-mgr
>>
>>  ceph version 13.2.3 (9bf3c8b1a04b0aa4a3cc78456a508f1c48e70279) mimic
>> (stable)
>>  1: /usr/bin/ceph-mgr() [0x63ebd0]
>>  2: (()+0x11390) [0x7fbcbd819390]
>>  3: (gsignal()+0x38) [0x7fbcbcf66428]
>>  4: (abort()+0x16a) [0x7fbcbcf6802a]
>>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x250) [0x7fbcbe509510]
>>  6: (()+0x2e5587) [0x7fbcbe509587]
>>  7: (()+0x2e12de) [0x7fbcbe5052de]
>>  8: (MgrClient::~MgrClient()+0xc4) [0x5594f4]
>>  9: (MgrStandby::~MgrStandby()+0x14d) [0x55063d]
>>  10: (main()+0x24b) [0x49446b]
>>  11: (__libc_start_main()+0xf0) [0x7fbcbcf51830]
>>  12: (_start()+0x29) [0x497dc9]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>>
>> On Fri, Jan 4, 2019 at 1:53 PM Randall Smith <rbsm...@adams.edu> wrote:
>>
>>> I think this is the relevant section of the debug log. There's no
>>> AUTH_NONE error which would make things easy. You can see the same "Invalid
>>> argument" error that I'm seeing in the mgr debug output. The malformed
>>> request feels like a compatibility or protocol communication issue.
>>>
>>> 2019-01-04 13:41:58.972 7f88950f5700 10 mon.07@1(peon) e27
>>> ms_verify_authorizer 192.168.253.148:0/3301807723 client protocol 0
>>>
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon) e27 _ms_dispatch
>>> new session 0x40a58c0 MonSession(client.? 192.168.253.148:0/3301807723
>>> is open , features 0x3ffddff8ffa4fffb (luminous)) fea$ures
>>> 0x3ffddff8ffa4fffb
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697
>>> preprocess_query auth(proto 0 26 bytes epoch 0) v1 from client.?
>>> 192.168.253.148:0/3301807723
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697
>>> prep_auth() blob_size=26
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697
>>> AuthMonitor::assign_global_id m=auth(proto 0 26 bytes epoch 0) v1 mon=1/3
>>> last_allocated=12307825 max_global_id=12353896
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697
>>> next_global_id should be 12307828
>>>
>>> 2019-01-04 13:41:58.972 7f8890143700  2 mon.07@1(peon) e27 send_reply
>>> 0x5449180 0x4ee1c00 auth_reply(proto 2 0 (0) Success) v1
>>>
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697
>>> preprocess_query auth(proto 2 2 bytes epoch 0) v1 from client.?
>>> 192.168.253.148:0/3301807723
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon).auth v87697
>>> prep_auth() blob_size=2
>>> 2019-01-04 13:41:58.972 7f8890143700  0 mon.07@1(peon).auth v87697
>>> caught error when trying to handle auth request, probably malformed
>>> request
>>> 2019-01-04 13:41:58.972 7f8890143700  2 mon.07@1(peon) e27 send_reply
>>> 0x30dc500 0x5caa280 auth_reply(proto 2 -22 (22) Invalid argument) v1
>>>
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon) e27
>>> ms_handle_reset 0x4102a00 192.168.253.148:0/3301807723
>>>
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon) e27 reset/close
>>> on session client.? 192.168.253.148:0/3301807723
>>>
>>> 2019-01-04 13:41:58.972 7f8890143700 10 mon.07@1(peon) e27
>>> remove_session 0x40a58c0 client.? 192.168.253.148:0/3301807723 features
>>> 0x3ffddff8ffa4fffb
>>>
>>> On Fri, Jan 4, 2019 at 12:32 PM Gregory Farnum <gfar...@redhat.com>
>>> wrote:
>>>
>>>> You can also get more data by checking what the monitor logs for that
>>>> manager on the connect attempt (if you turn up its debug mon or debug
>>>> ms settings). If one of your managers is behaving, I'd examine its
>>>> configuration file and compare to the others. For instance, that
>>>> "Invalid argument" might mean the manager is trying to use "AUTH_NONE"
>>>> (no CephX) and the monitors aren't allowing that.
>>>> -Greg
>>>>
>>>> On Fri, Jan 4, 2019 at 6:26 AM Randall Smith <rbsm...@adams.edu> wrote:
>>>> >
>>>> > Greetings,
>>>> >
>>>> > I'm upgrading my cluster from luminous to mimic. I've upgraded my
>>>> monitors and am attempting to upgrade the mgrs. Unfortunately, after an
>>>> upgrade the mgr daemon exits immediately with error code 1.
>>>> >
>>>> > I've tried running ceph-mgr in debug mode to try to see what's
>>>> happening but the output (below) is a bit cryptic for me. It looks like
>>>> authentication might be failing but it was working prior to the upgrade.
>>>> >
>>>> > I do have "auth supported = cephx" in the global section of ceph.conf.
>>>> >
>>>> > What do I need to do to fix this?
>>>> >
>>>> > Thanks.
>>>> >
>>>> > /usr/bin/ceph-mgr -f --cluster ceph --id 8 --setuser ceph --setgroup
>>>> ceph -d --debug_ms 5
>>>> > 2019-01-04 07:01:38.457 7f808f83f700  2 Event(0x30c42c0 nevent=5000
>>>> time_id=1).set_owner idx=0 owner=140190140331776
>>>> > 2019-01-04 07:01:38.457 7f808f03e700  2 Event(0x30c4500 nevent=5000
>>>> time_id=1).set_owner idx=1 owner=140190131939072
>>>> > 2019-01-04 07:01:38.457 7f808e83d700  2 Event(0x30c4e00 nevent=5000
>>>> time_id=1).set_owner idx=2 owner=140190123546368
>>>> > 2019-01-04 07:01:38.457 7f809dd5b380  1  Processor -- start
>>>> > 2019-01-04 07:01:38.477 7f809dd5b380  1 -- - start start
>>>> > 2019-01-04 07:01:38.481 7f809dd5b380  1 -- - -->
>>>> 192.168.253.147:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 --
>>>> 0x32a6780 con 0
>>>> > 2019-01-04 07:01:38.481 7f809dd5b380  1 -- - -->
>>>> 192.168.253.148:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 --
>>>> 0x32a6a00 con 0
>>>> > 2019-01-04 07:01:38.481 7f808e83d700  1 --
>>>> 192.168.253.148:0/1359135487 learned_addr learned my addr
>>>> 192.168.253.148:0/1359135487
>>>> > 2019-01-04 07:01:38.481 7f808e83d700  2 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500
>>>> :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got
>>>> newly_a$
>>>> > ked_seq 0 vs out_seq 0
>>>> > 2019-01-04 07:01:38.481 7f808f03e700  2 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00
>>>> :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got
>>>> newly_a$
>>>> > ked_seq 0 vs out_seq 0
>>>> > 2019-01-04 07:01:38.481 7f808f03e700  5 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00
>>>> :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx
>>>> mon.1 seq
>>>> > 1 0x30c5440 mon_map magic: 0 v1
>>>> > 2019-01-04 07:01:38.481 7f808e83d700  5 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500
>>>> :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx
>>>> mon.2 seq
>>>> > 1 0x30c5680 mon_map magic: 0 v1
>>>> > 2019-01-04 07:01:38.481 7f808f03e700  5 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00
>>>> :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx
>>>> mon.1 seq
>>>> > 2 0x32a6780 auth_reply(proto 2 0 (0) Success) v1
>>>> > 2019-01-04 07:01:38.481 7f808e83d700  5 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500
>>>> :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx
>>>> mon.2 seq
>>>> > 2 0x32a6a00 auth_reply(proto 2 0 (0) Success) v1
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 <== mon.1 192.168.253.147:6789/0 1 ====
>>>> mon_map magic: 0 v1 ==== 370+0+0 (3034216899 0 0) 0x30c5440 con 0x332ce00
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 <== mon.2 192.168.253.148:6789/0 1 ====
>>>> mon_map magic: 0 v1 ==== 370+0+0 (3034216899 0 0) 0x30c5680 con 0x332d500
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 <== mon.1 192.168.253.147:6789/0 2 ====
>>>> auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (3430158761 0 0) 0x32a6780
>>>> con 0x33$
>>>> > ce00
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 --> 192.168.253.147:6789/0 -- auth(proto
>>>> 2 2 bytes epoch 0) v1 -- 0x32a6f00 con 0
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 <== mon.2 192.168.253.148:6789/0 2 ====
>>>> auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (3242503871 0 0) 0x32a6a00
>>>> con 0x33$
>>>> > d500
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 --> 192.168.253.148:6789/0 -- auth(proto
>>>> 2 2 bytes epoch 0) v1 -- 0x32a6780 con 0
>>>> > 2019-01-04 07:01:38.481 7f808f03e700  5 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00
>>>> :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx
>>>> mon.1 seq
>>>> > 3 0x32a6f00 auth_reply(proto 2 -22 (22) Invalid argument) v1
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 <== mon.1 192.168.253.147:6789/0 3 ====
>>>> auth_reply(proto 2 -22 (22) Invalid argument) v1 ==== 24+0+0 (882932531 0
>>>> 0) 0x32a6f$
>>>> > 0 con 0x332ce00
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00
>>>> :-1 s=STATE_OPEN pgs=74172 cs=1 l=1).mark_down
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  2 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.147:6789/0 conn(0x332ce00
>>>> :-1 s=STATE_OPEN pgs=74172 cs=1 l=1)._stop
>>>> > 2019-01-04 07:01:38.481 7f808e83d700  5 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500
>>>> :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx
>>>> mon.2 seq
>>>> > 3 0x32a6780 auth_reply(proto 2 -22 (22) Invalid argument) v1
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 <== mon.2 192.168.253.148:6789/0 3 ====
>>>> auth_reply(proto 2 -22 (22) Invalid argument) v1 ==== 24+0+0 (1359424806 0
>>>> 0) 0x32a6$
>>>> > 80 con 0x332d500
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  1 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500
>>>> :-1 s=STATE_OPEN pgs=74275 cs=1 l=1).mark_down
>>>> > 2019-01-04 07:01:38.481 7f808e03c700  2 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.148:6789/0 conn(0x332d500
>>>> :-1 s=STATE_OPEN pgs=74275 cs=1 l=1)._stop
>>>> >
>>>> > 2019-01-04 07:01:38.481 7f809dd5b380  1 --
>>>> 192.168.253.148:0/1359135487 shutdown_connections
>>>> > 2019-01-04 07:01:38.481 7f809dd5b380  5 --
>>>> 192.168.253.148:0/1359135487 shutdown_connections mark down
>>>> 192.168.253.148:6789/0 0x332d500
>>>> > 2019-01-04 07:01:38.481 7f809dd5b380  5 --
>>>> 192.168.253.148:0/1359135487 shutdown_connections mark down
>>>> 192.168.253.147:6789/0 0x332ce00
>>>> > 2019-01-04 07:01:38.481 7f809dd5b380  5 --
>>>> 192.168.253.148:0/1359135487 shutdown_connections delete 0x332ce00
>>>> > 2019-01-04 07:01:38.481 7f809dd5b380  5 --
>>>> 192.168.253.148:0/1359135487 shutdown_connections delete 0x332d500
>>>> > 2019-01-04 07:01:38.485 7f809dd5b380  1 --
>>>> 192.168.253.148:0/1359135487 shutdown_connections
>>>> > 2019-01-04 07:01:38.485 7f809dd5b380  1 --
>>>> 192.168.253.148:0/1359135487 wait complete.
>>>> > 2019-01-04 07:01:38.485 7f809dd5b380  1 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.148:0/1359135487
>>>> conn(0x332c000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down
>>>> > 2019-01-04 07:01:38.485 7f809dd5b380  2 --
>>>> 192.168.253.148:0/1359135487 >> 192.168.253.148:0/1359135487
>>>> conn(0x332c000 :-1 s=STATE_NONE pgs=0 cs=0 l=0)._stop
>>>> > failed to fetch mon config (--no-mon-config to skip)
>>>> >
>>>> > --
>>>> > Randall Smith
>>>> > Computing Services
>>>> > Adams State University
>>>> > http://www.adams.edu/
>>>> > 719-587-7741
>>>> > _______________________________________________
>>>> > ceph-users mailing list
>>>> > ceph-users@lists.ceph.com
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>>> --
>>> Randall Smith
>>> Computing Services
>>> Adams State University
>>> http://www.adams.edu/
>>> 719-587-7741
>>>
>>
>>
>> --
>> Randall Smith
>> Computing Services
>> Adams State University
>> http://www.adams.edu/
>> 719-587-7741
>>
>
>
> --
> Randall Smith
> Computing Services
> Adams State University
> http://www.adams.edu/
> 719-587-7741
>


-- 
Randall Smith
Computing Services
Adams State University
http://www.adams.edu/
719-587-7741
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to