Update to this -- I tried building a new host and a new OSD, new disk, and
I am having the same issue.



I set osd debug level to 10 -- the issue looks like it's coming from a mon
daemon. Still trying to learn enough about the internals of ceph to
understand what's happening here.

Relevant debug logs(I think)


2017-07-25 14:21:58.889016 7f25a88af700  1 -- 10.0.15.142:6800/16150 <==
mon.1 10.0.15.51:6789/0 1 ==== mon_map magic: 0 v1 ==== 541+0+0 (2831459213
0 0) 0x556640ecd900 con 0x556641949800
2017-07-25 14:21:58.889109 7f25a88af700  1 -- 10.0.15.142:6800/16150 <==
mon.1 10.0.15.51:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
33+0+0 (248727397 0 0) 0x556640ecdb80 con 0x556641949800
2017-07-25 14:21:58.889204 7f25a88af700  1 -- 10.0.15.142:6800/16150 -->
10.0.15.51:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x556640ecd400
con 0
2017-07-25 14:21:58.889966 7f25a88af700  1 -- 10.0.15.142:6800/16150 <==
mon.1 10.0.15.51:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
206+0+0 (3141870879 0 0) 0x556640ecd400 con 0x556641949800
2017-07-25 14:21:58.890066 7f25a88af700  1 -- 10.0.15.142:6800/16150 -->
10.0.15.51:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x556640ecdb80
con 0
2017-07-25 14:21:58.890759 7f25a88af700  1 -- 10.0.15.142:6800/16150 <==
mon.1 10.0.15.51:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ====
564+0+0 (1715764650 0 0) 0x556640ecdb80 con 0x556641949800
2017-07-25 14:21:58.890871 7f25a88af700  1 -- 10.0.15.142:6800/16150 -->
10.0.15.51:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x556640e77680 con 0
2017-07-25 14:21:58.890901 7f25a88af700  1 -- 10.0.15.142:6800/16150 -->
10.0.15.51:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x556640ecd400 con
0
2017-07-25 14:21:58.891494 7f25a88af700  1 -- 10.0.15.142:6800/16150 <==
mon.1 10.0.15.51:6789/0 5 ==== mon_map magic: 0 v1 ==== 541+0+0 (2831459213
0 0) 0x556640ecde00 con 0x556641949800
2017-07-25 14:21:58.891555 7f25a88af700  1 -- 10.0.15.142:6800/16150 <==
mon.1 10.0.15.51:6789/0 6 ==== auth_reply(proto 2 0 (0) Success) v1 ====
194+0+0 (1036670921 0 0) 0x556640ece080 con 0x556641949800
2017-07-25 14:21:58.892003 7f25b5e71c80 10 osd.7 0 mon_cmd_maybe_osd_create
cmd: {"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]}
2017-07-25 14:21:58.892039 7f25b5e71c80  1 -- 10.0.15.142:6800/16150 -->
10.0.15.51:6789/0 -- mon_command({"prefix": "osd crush set-device-class",
"class": "hdd", "ids": ["7"]} v 0) v1 -- 0x556640e78d00 con 0
*2017-07-25 14:21:58.894596 7f25a88af700  1 -- 10.0.15.142:6800/16150
<http://10.0.15.142:6800/16150> <== mon.1 10.0.15.51:6789/0
<http://10.0.15.51:6789/0> 7 ==== mon_command_ack([{"prefix": "osd crush
set-device-class", "class": "hdd", "ids": ["7"]}]=-2 (2) No such file or
directory v10406) v1 ==== 133+0+0 (3400959855 0 0) 0x556640ece300 con
0x556641949800*
2017-07-25 14:21:58.894797 7f25b5e71c80  1 -- 10.0.15.142:6800/16150 -->
10.0.15.51:6789/0 -- mon_command({"prefix": "osd create", "id": 7, "uuid":
"92445e4f-850e-453b-b5ab-569d1414f72d"} v 0) v1 -- 0x556640e79180 con 0
2017-07-25 14:21:58.896301 7f25a88af700  1 -- 10.0.15.142:6800/16150 <==
mon.1 10.0.15.51:6789/0 8 ==== mon_command_ack([{"prefix": "osd create",
"id": 7, "uuid": "92445e4f-850e-453b-b5ab-569d1414f72d"}]=0  v10406) v1
==== 115+0+2 (2540205126 0 1371665406) 0x556640ece580 con 0x556641949800
2017-07-25 14:21:58.896473 7f25b5e71c80 10 osd.7 0 mon_cmd_maybe_osd_create
cmd: {"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]}
2017-07-25 14:21:58.896516 7f25b5e71c80  1 -- 10.0.15.142:6800/16150 -->
10.0.15.51:6789/0 -- mon_command({"prefix": "osd crush set-device-class",
"class": "hdd", "ids": ["7"]} v 0) v1 -- 0x556640e793c0 con 0
*2017-07-25 14:21:58.898180 7f25a88af700  1 -- 10.0.15.142:6800/16150
<http://10.0.15.142:6800/16150> <== mon.1 10.0.15.51:6789/0
<http://10.0.15.51:6789/0> 9 ==== mon_command_ack([{"prefix": "osd crush
set-device-class", "class": "hdd", "ids": ["7"]}]=-2 (2) No such file or
directory v10406) v1 ==== 133+0+0 (3400959855 0 0) 0x556640ecd900 con
0x556641949800*
*2017-07-25 14:21:58.898276 7f25b5e71c80 -1 osd.7 0
mon_cmd_maybe_osd_create fail: '(2) No such file or directory': (2) No such
file or directory*
2017-07-25 14:21:58.898380 7f25b5e71c80  1 -- 10.0.15.142:6800/16150 >>
10.0.15.51:6789/0 conn(0x556641949800 :-1 s=STATE_OPEN pgs=367879 cs=1
l=1).mark_down




On Mon, Jul 24, 2017 at 1:33 PM, Daniel K <satha...@gmail.com> wrote:

> List --
>
> I have a 4-node cluster running on baremetal and have a need to use the
> kernel client on 2 nodes. As I read you should not run the kernel client on
> a node that runs an OSD daemon, I decided to move the OSD daemons into a VM
> on the same device.
>
> Orignal host is stor-vm2(bare metal), new host is stor-vm2a(Virtual)
>
> All went well -- I did these steps(for each OSD, 5 total per host)
>
> - setup the VM
> - install the OS
> - installed ceph(using ceph-deploy)
> - set noout
> - stopped ceph osd on bare metal host
> - unmount /dev/sdb1 from /var/lib/ceph/osd/ceph-0
> - add /dev/sdb to the VM
> - ceph detected the osd and started automatically.
> - moved VM host to the same bucket as physical host in crushmap
>
> I did this for each OSD, and despite some recovery IO because of the
> updated crushmap, all OSDs were up.
>
> I rebooted the physical host, which rebooted the VM, and now the OSDs are
> refusing to start.
>
> I've tried moving them back to the bare metal host with the same results.
>
> Any ideas?
>
> Here are what seem to be the relevant osd log lines:
>
> 2017-07-24 13:21:53.561265 7faf1752fc80  0 osd.10 8854 crush map has
> features 2200130813952, adjusting msgr requires for clients
> 2017-07-24 13:21:53.561284 7faf1752fc80  0 osd.10 8854 crush map has
> features 2200130813952 was 8705, adjusting msgr requires for mons
> 2017-07-24 13:21:53.561298 7faf1752fc80  0 osd.10 8854 crush map has
> features 720578140510109696, adjusting msgr requires for osds
> 2017-07-24 13:21:55.626834 7faf1752fc80  0 osd.10 8854 load_pgs
> 2017-07-24 13:22:20.970222 7faf1752fc80  0 osd.10 8854 load_pgs opened 536
> pgs
> 2017-07-24 13:22:20.972659 7faf1752fc80  0 osd.10 8854 using
> weightedpriority op queue with priority op cut off at 64.
> 2017-07-24 13:22:20.976861 7faf1752fc80 -1 osd.10 8854 log_to_monitors
> {default=true}
> 2017-07-24 13:22:20.998233 7faf1752fc80 -1 osd.10 8854
> mon_cmd_maybe_osd_create fail: '(2) No such file or directory': (2) No such
> file or directory
> 2017-07-24 13:22:20.999165 7faf1752fc80  1 
> bluestore(/var/lib/ceph/osd/ceph-10)
> umount
> 2017-07-24 13:22:21.016146 7faf1752fc80  1 freelist shutdown
> 2017-07-24 13:22:21.016243 7faf1752fc80  4 rocksdb:
> [/build/ceph-12.1.1/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling
> all background work
> 2017-07-24 13:22:21.020440 7faf1752fc80  4 rocksdb:
> [/build/ceph-12.1.1/src/rocksdb/db/db_impl.cc:343] Shutdown complete
> 2017-07-24 13:22:21.274481 7faf1752fc80  1 bluefs umount
> 2017-07-24 13:22:21.275822 7faf1752fc80  1 bdev(0x558bb1f82d80
> /var/lib/ceph/osd/ceph-10/block) close
> 2017-07-24 13:22:21.485226 7faf1752fc80  1 bdev(0x558bb1f82b40
> /var/lib/ceph/osd/ceph-10/block) close
> 2017-07-24 13:22:21.551009 7faf1752fc80 -1  ** ERROR: osd init failed: (2)
> No such file or directory
> 2017-07-24 13:22:21.563567 7faf1752fc80 -1 
> /build/ceph-12.1.1/src/common/HeartbeatMap.cc:
> In function 'ceph::HeartbeatMap::~HeartbeatMap()' thread 7faf1752fc80
> time 2017-07-24 13:22:21.558275
> /build/ceph-12.1.1/src/common/HeartbeatMap.cc: 39: FAILED
> assert(m_workers.empty())
>
>  ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous
> (rc)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x558ba6ba6b72]
>  2: (()+0xb81cf1) [0x558ba6cc0cf1]
>  3: (CephContext::~CephContext()+0x4d9) [0x558ba6ca77b9]
>  4: (CephContext::put()+0xe6) [0x558ba6ca7ab6]
>  5: (main()+0x563) [0x558ba650df73]
>  6: (__libc_start_main()+0xf0) [0x7faf14999830]
>  7: (_start()+0x29) [0x558ba6597cf9]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- begin dump of recent events ---
>
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to