Update to this -- I tried building a new host and a new OSD, new disk, and I am having the same issue.
I set osd debug level to 10 -- the issue looks like it's coming from a mon daemon. Still trying to learn enough about the internals of ceph to understand what's happening here. Relevant debug logs(I think) 2017-07-25 14:21:58.889016 7f25a88af700 1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 1 ==== mon_map magic: 0 v1 ==== 541+0+0 (2831459213 0 0) 0x556640ecd900 con 0x556641949800 2017-07-25 14:21:58.889109 7f25a88af700 1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (248727397 0 0) 0x556640ecdb80 con 0x556641949800 2017-07-25 14:21:58.889204 7f25a88af700 1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x556640ecd400 con 0 2017-07-25 14:21:58.889966 7f25a88af700 1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (3141870879 0 0) 0x556640ecd400 con 0x556641949800 2017-07-25 14:21:58.890066 7f25a88af700 1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x556640ecdb80 con 0 2017-07-25 14:21:58.890759 7f25a88af700 1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 564+0+0 (1715764650 0 0) 0x556640ecdb80 con 0x556641949800 2017-07-25 14:21:58.890871 7f25a88af700 1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x556640e77680 con 0 2017-07-25 14:21:58.890901 7f25a88af700 1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x556640ecd400 con 0 2017-07-25 14:21:58.891494 7f25a88af700 1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 5 ==== mon_map magic: 0 v1 ==== 541+0+0 (2831459213 0 0) 0x556640ecde00 con 0x556641949800 2017-07-25 14:21:58.891555 7f25a88af700 1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 6 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (1036670921 0 0) 0x556640ece080 con 0x556641949800 2017-07-25 14:21:58.892003 7f25b5e71c80 10 osd.7 0 mon_cmd_maybe_osd_create cmd: {"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]} 2017-07-25 14:21:58.892039 7f25b5e71c80 1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- mon_command({"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]} v 0) v1 -- 0x556640e78d00 con 0 *2017-07-25 14:21:58.894596 7f25a88af700 1 -- 10.0.15.142:6800/16150 <http://10.0.15.142:6800/16150> <== mon.1 10.0.15.51:6789/0 <http://10.0.15.51:6789/0> 7 ==== mon_command_ack([{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]}]=-2 (2) No such file or directory v10406) v1 ==== 133+0+0 (3400959855 0 0) 0x556640ece300 con 0x556641949800* 2017-07-25 14:21:58.894797 7f25b5e71c80 1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- mon_command({"prefix": "osd create", "id": 7, "uuid": "92445e4f-850e-453b-b5ab-569d1414f72d"} v 0) v1 -- 0x556640e79180 con 0 2017-07-25 14:21:58.896301 7f25a88af700 1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 8 ==== mon_command_ack([{"prefix": "osd create", "id": 7, "uuid": "92445e4f-850e-453b-b5ab-569d1414f72d"}]=0 v10406) v1 ==== 115+0+2 (2540205126 0 1371665406) 0x556640ece580 con 0x556641949800 2017-07-25 14:21:58.896473 7f25b5e71c80 10 osd.7 0 mon_cmd_maybe_osd_create cmd: {"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]} 2017-07-25 14:21:58.896516 7f25b5e71c80 1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- mon_command({"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]} v 0) v1 -- 0x556640e793c0 con 0 *2017-07-25 14:21:58.898180 7f25a88af700 1 -- 10.0.15.142:6800/16150 <http://10.0.15.142:6800/16150> <== mon.1 10.0.15.51:6789/0 <http://10.0.15.51:6789/0> 9 ==== mon_command_ack([{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]}]=-2 (2) No such file or directory v10406) v1 ==== 133+0+0 (3400959855 0 0) 0x556640ecd900 con 0x556641949800* *2017-07-25 14:21:58.898276 7f25b5e71c80 -1 osd.7 0 mon_cmd_maybe_osd_create fail: '(2) No such file or directory': (2) No such file or directory* 2017-07-25 14:21:58.898380 7f25b5e71c80 1 -- 10.0.15.142:6800/16150 >> 10.0.15.51:6789/0 conn(0x556641949800 :-1 s=STATE_OPEN pgs=367879 cs=1 l=1).mark_down On Mon, Jul 24, 2017 at 1:33 PM, Daniel K <satha...@gmail.com> wrote: > List -- > > I have a 4-node cluster running on baremetal and have a need to use the > kernel client on 2 nodes. As I read you should not run the kernel client on > a node that runs an OSD daemon, I decided to move the OSD daemons into a VM > on the same device. > > Orignal host is stor-vm2(bare metal), new host is stor-vm2a(Virtual) > > All went well -- I did these steps(for each OSD, 5 total per host) > > - setup the VM > - install the OS > - installed ceph(using ceph-deploy) > - set noout > - stopped ceph osd on bare metal host > - unmount /dev/sdb1 from /var/lib/ceph/osd/ceph-0 > - add /dev/sdb to the VM > - ceph detected the osd and started automatically. > - moved VM host to the same bucket as physical host in crushmap > > I did this for each OSD, and despite some recovery IO because of the > updated crushmap, all OSDs were up. > > I rebooted the physical host, which rebooted the VM, and now the OSDs are > refusing to start. > > I've tried moving them back to the bare metal host with the same results. > > Any ideas? > > Here are what seem to be the relevant osd log lines: > > 2017-07-24 13:21:53.561265 7faf1752fc80 0 osd.10 8854 crush map has > features 2200130813952, adjusting msgr requires for clients > 2017-07-24 13:21:53.561284 7faf1752fc80 0 osd.10 8854 crush map has > features 2200130813952 was 8705, adjusting msgr requires for mons > 2017-07-24 13:21:53.561298 7faf1752fc80 0 osd.10 8854 crush map has > features 720578140510109696, adjusting msgr requires for osds > 2017-07-24 13:21:55.626834 7faf1752fc80 0 osd.10 8854 load_pgs > 2017-07-24 13:22:20.970222 7faf1752fc80 0 osd.10 8854 load_pgs opened 536 > pgs > 2017-07-24 13:22:20.972659 7faf1752fc80 0 osd.10 8854 using > weightedpriority op queue with priority op cut off at 64. > 2017-07-24 13:22:20.976861 7faf1752fc80 -1 osd.10 8854 log_to_monitors > {default=true} > 2017-07-24 13:22:20.998233 7faf1752fc80 -1 osd.10 8854 > mon_cmd_maybe_osd_create fail: '(2) No such file or directory': (2) No such > file or directory > 2017-07-24 13:22:20.999165 7faf1752fc80 1 > bluestore(/var/lib/ceph/osd/ceph-10) > umount > 2017-07-24 13:22:21.016146 7faf1752fc80 1 freelist shutdown > 2017-07-24 13:22:21.016243 7faf1752fc80 4 rocksdb: > [/build/ceph-12.1.1/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling > all background work > 2017-07-24 13:22:21.020440 7faf1752fc80 4 rocksdb: > [/build/ceph-12.1.1/src/rocksdb/db/db_impl.cc:343] Shutdown complete > 2017-07-24 13:22:21.274481 7faf1752fc80 1 bluefs umount > 2017-07-24 13:22:21.275822 7faf1752fc80 1 bdev(0x558bb1f82d80 > /var/lib/ceph/osd/ceph-10/block) close > 2017-07-24 13:22:21.485226 7faf1752fc80 1 bdev(0x558bb1f82b40 > /var/lib/ceph/osd/ceph-10/block) close > 2017-07-24 13:22:21.551009 7faf1752fc80 -1 ** ERROR: osd init failed: (2) > No such file or directory > 2017-07-24 13:22:21.563567 7faf1752fc80 -1 > /build/ceph-12.1.1/src/common/HeartbeatMap.cc: > In function 'ceph::HeartbeatMap::~HeartbeatMap()' thread 7faf1752fc80 > time 2017-07-24 13:22:21.558275 > /build/ceph-12.1.1/src/common/HeartbeatMap.cc: 39: FAILED > assert(m_workers.empty()) > > ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous > (rc) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x558ba6ba6b72] > 2: (()+0xb81cf1) [0x558ba6cc0cf1] > 3: (CephContext::~CephContext()+0x4d9) [0x558ba6ca77b9] > 4: (CephContext::put()+0xe6) [0x558ba6ca7ab6] > 5: (main()+0x563) [0x558ba650df73] > 6: (__libc_start_main()+0xf0) [0x7faf14999830] > 7: (_start()+0x29) [0x558ba6597cf9] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- begin dump of recent events --- > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com