Bryan, Once the rest of the cluster was updated to v0.94.5 it now appears the one host running infernalis v9.2.0 OSDs are booting.
Bob On Fri, Dec 18, 2015 at 3:44 PM, Bob R <b...@drinksbeer.org> wrote: > Bryan, > > I rebooted another host which wasn't updated to CentOS 7.2 and those OSDs > also failed to come out of booting state. I thought I'd restarted each OSD > host after upgrading them to infernalis but I must have been mistaken and > after running ceph tell osd.* version I saw we were on a mix of v0.94.1, > v0.94.2, v0.94.4, and v0.94.5. I've downgraded the two hosts we were having > problems with to hammer v0.94.5 and once the cluster is happy again we will > try upgrading again. > > Good luck. > > Bob > > On Fri, Dec 18, 2015 at 3:21 PM, Stillwell, Bryan < > bryan.stillw...@twcable.com> wrote: > >> I ran into a similar problem while in the middle of upgrading from Hammer >> (0.94.5) to Infernalis (9.2.0). I decided to try rebuilding one of the >> OSDs by using 'ceph-disk prepare /dev/sdb' and it never comes up: >> >> root@b3:~# ceph daemon osd.10 status >> { >> "cluster_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", >> "osd_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", >> "whoami": 10, >> "state": "booting", >> "oldest_map": 25804, >> "newest_map": 25904, >> "num_pgs": 0 >> } >> >> Here's what is written to /var/log/ceph/osd/ceph-osd.10.log: >> >> 2015-12-18 16:09:48.928462 7fd5e2bec940 0 ceph version 9.2.0 >> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 6866 >> 2015-12-18 16:09:48.931387 7fd5e2bec940 1 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs in /var/lib/ceph/tmp/mnt.IOnlxY >> 2015-12-18 16:09:48.931417 7fd5e2bec940 1 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs fsid is already set to >> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx >> 2015-12-18 16:09:48.931422 7fd5e2bec940 1 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) write_version_stamp 4 >> 2015-12-18 16:09:48.932671 7fd5e2bec940 0 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342) >> 2015-12-18 16:09:48.934953 7fd5e2bec940 1 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) leveldb db exists/created >> 2015-12-18 16:09:48.935082 7fd5e2bec940 1 journal _open >> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size >> 4096 bytes, directio = 1, aio = 1 >> 2015-12-18 16:09:48.935218 7fd5e2bec940 -1 journal check: ondisk fsid >> 00000000-0000-0000-0000-000000000000 doesn't match >> expected xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, invalid (someone else's?) >> journal >> 2015-12-18 16:09:48.935227 7fd5e2bec940 1 journal close >> /var/lib/ceph/tmp/mnt.IOnlxY/journal >> 2015-12-18 16:09:48.935452 7fd5e2bec940 1 journal _open >> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size >> 4096 bytes, directio = 1, aio = 1 >> 2015-12-18 16:09:48.935771 7fd5e2bec940 0 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkjournal created journal on >> /var/lib/ceph/tmp/mnt.IOnlxY/journal >> 2015-12-18 16:09:48.935803 7fd5e2bec940 1 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs done in >> /var/lib/ceph/tmp/mnt.IOnlxY >> 2015-12-18 16:09:48.935919 7fd5e2bec940 0 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342) >> 2015-12-18 16:09:48.936548 7fd5e2bec940 0 >> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: >> FIEMAP ioctl is disabled via 'filestore fiemap' config option >> 2015-12-18 16:09:48.936559 7fd5e2bec940 0 >> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: >> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option >> 2015-12-18 16:09:48.936588 7fd5e2bec940 0 >> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: >> splice is supported >> 2015-12-18 16:09:48.938319 7fd5e2bec940 0 >> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: >> syncfs(2) syscall fully supported (by glibc and kernel) >> 2015-12-18 16:09:48.938394 7fd5e2bec940 0 >> xfsfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: extsize >> is supported and your kernel >= 3.5 >> 2015-12-18 16:09:48.940420 7fd5e2bec940 0 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mount: enabling WRITEAHEAD journal >> mode: checkpoint is not enabled >> 2015-12-18 16:09:48.940646 7fd5e2bec940 1 journal _open >> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size >> 4096 bytes, directio = 1, aio = 1 >> 2015-12-18 16:09:48.940865 7fd5e2bec940 1 journal _open >> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size >> 4096 bytes, directio = 1, aio = 1 >> 2015-12-18 16:09:48.941270 7fd5e2bec940 1 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) upgrade >> 2015-12-18 16:09:48.941389 7fd5e2bec940 -1 >> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) could not find >> -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory >> 2015-12-18 16:09:48.945392 7fd5e2bec940 1 journal close >> /var/lib/ceph/tmp/mnt.IOnlxY/journal >> 2015-12-18 16:09:48.946175 7fd5e2bec940 -1 created object store >> /var/lib/ceph/tmp/mnt.IOnlxY journal /var/lib/ceph/tmp/mnt.IOnlxY/journal >> for osd.10 fsid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx >> 2015-12-18 16:09:48.946269 7fd5e2bec940 -1 auth: error reading file: >> /var/lib/ceph/tmp/mnt.IOnlxY/keyring: can't open >> /var/lib/ceph/tmp/mnt.IOnlxY/keyring: (2) No such file or directory >> 2015-12-18 16:09:48.946623 7fd5e2bec940 -1 created new key in keyring >> /var/lib/ceph/tmp/mnt.IOnlxY/keyring >> 2015-12-18 16:09:50.698753 7fb5db130940 0 ceph version 9.2.0 >> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 7045 >> 2015-12-18 16:09:50.745427 7fb5db130940 0 >> filestore(/var/lib/ceph/osd/ceph-10) backend xfs (magic 0x58465342) >> 2015-12-18 16:09:50.745978 7fb5db130940 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP >> ioctl is disabled via 'filestore fiemap' config option >> 2015-12-18 16:09:50.745987 7fb5db130940 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: >> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option >> 2015-12-18 16:09:50.746012 7fb5db130940 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: splice >> is supported >> 2015-12-18 16:09:50.746517 7fb5db130940 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: >> syncfs(2) syscall fully supported (by glibc and kernel) >> 2015-12-18 16:09:50.746616 7fb5db130940 0 >> xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: extsize is >> supported and your kernel >= 3.5 >> 2015-12-18 16:09:50.748775 7fb5db130940 0 >> filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal >> mode: checkpoint is not enabled >> 2015-12-18 16:09:50.749005 7fb5db130940 1 journal _open >> /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 >> bytes, directio = 1, aio = 1 >> 2015-12-18 16:09:50.749256 7fb5db130940 1 journal _open >> /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096 >> bytes, directio = 1, aio = 1 >> 2015-12-18 16:09:50.749632 7fb5db130940 1 >> filestore(/var/lib/ceph/osd/ceph-10) upgrade >> 2015-12-18 16:09:50.783188 7fb5db130940 0 <cls> >> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan >> 2015-12-18 16:09:50.851735 7fb5db130940 0 <cls> >> cls/hello/cls_hello.cc:305: loading cls_hello >> 2015-12-18 16:09:50.851807 7fb5db130940 0 osd.10 0 crush map has >> features 33816576, adjusting msgr requires for clients >> 2015-12-18 16:09:50.851818 7fb5db130940 0 osd.10 0 crush map has >> features 33816576 was 8705, adjusting msgr requires for mons >> 2015-12-18 16:09:50.851821 7fb5db130940 0 osd.10 0 crush map has >> features 33816576, adjusting msgr requires for osds >> 2015-12-18 16:09:50.851965 7fb5db130940 0 osd.10 0 load_pgs >> 2015-12-18 16:09:50.851988 7fb5db130940 0 osd.10 0 load_pgs opened 0 pgs >> 2015-12-18 16:09:50.852822 7fb5db130940 -1 osd.10 0 log_to_monitors >> {default=true} >> 2015-12-18 16:09:50.870133 7fb5c7f39700 0 osd.10 0 ignoring osdmap until >> we have initialized >> 2015-12-18 16:09:50.870409 7fb5db130940 0 osd.10 0 done with init, >> starting boot process >> 2015-12-18 16:09:50.873357 7fb5c7f39700 0 osd.10 25804 crush map has >> features 104186773504, adjusting msgr requires for clients >> 2015-12-18 16:09:50.873368 7fb5c7f39700 0 osd.10 25804 crush map has >> features 379064680448 was 33825281, adjusting msgr requires for mons >> 2015-12-18 16:09:50.873374 7fb5c7f39700 0 osd.10 25804 crush map has >> features 379064680448, adjusting msgr requires for osds >> 2015-12-18 16:09:50.873377 7fb5c7f39700 0 osd.10 25804 >> check_osdmap_features enabling on-disk ERASURE CODES compat feature >> 2015-12-18 16:09:50.876187 7fb5c7f39700 0 log_channel(cluster) log [WRN] >> : failed to encode map e25805 with expected crc >> 2015-12-18 16:09:50.879534 7fb5c7f39700 0 log_channel(cluster) log [WRN] >> : failed to encode map e25805 with expected crc >> 2015-12-18 16:09:50.950405 7fb5c7f39700 0 log_channel(cluster) log [WRN] >> : failed to encode map e25905 with expected crc >> 2015-12-18 16:09:50.983355 7fb5c7f39700 0 log_channel(cluster) log [WRN] >> : failed to encode map e25905 with expected crc >> >> I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily >> kernel (4.2.0-21.25~14.04.1). >> >> Are you running a mixed cluster right now too? For example this is my >> cluster right now: >> >> root@b1:~# ceph tell osd.* version | grep version | uniq -c >> osd.10: Error ENXIO: problem getting command descriptions from osd.10 >> osd.10: problem getting command descriptions from osd.10 >> 11 "version": "ceph version 9.2.0 >> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)" >> 15 "version": "ceph version 0.94.5 >> (9764da52395923e0b32908d83a9f7304401fee43)" >> >> Bryan >> >> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Bob R < >> b...@drinksbeer.org> >> Date: Wednesday, December 16, 2015 at 11:45 AM >> To: ceph-users <ceph-users@lists.ceph.com> >> Subject: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and >> ceph infernalis 9.2.0 >> >> We've been operating a cluster relatively incident free since 0.86. On >> Monday I did a yum update on one node, ceph00, and after rebooting we're >> seeing every OSD stuck in 'booting' state. I've tried removing all of the >> OSDs and recreating them with ceph-deploy (ceph-disk required modification >> to use partx -a rather than partprobe) but we see the same status. I'm not >> sure how to troubleshoot this further. Our OSDs on this host are now >> running as the ceph user which may be related to the issue as the other >> three hosts are running as root (although I followed the steps listed to >> upgrade from hammer to infernalis and did chown -R ceph:ceph /var/lib/ceph >> on each node). >> >> [root@ceph00 ceph]# lsb_release -idrc >> Distributor ID: CentOS >> Description: CentOS Linux release 7.2.1511 (Core) >> Release: 7.2.1511 >> Codename: Core >> >> [root@ceph00 ceph]# ceph --version >> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) >> >> [root@ceph00 ceph]# ceph daemon osd.0 status >> { >> "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e", >> "osd_fsid": "ddf659ad-a3db-4094-b4d0-7d50f34b8f75", >> "whoami": 0, >> "state": "booting", >> "oldest_map": 25243, >> "newest_map": 26610, >> "num_pgs": 0 >> } >> >> [root@ceph00 ceph]# ceph daemon osd.3 status >> { >> "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e", >> "osd_fsid": "8b1acd8a-645d-4dc2-8c1d-6dbb1715265f", >> "whoami": 3, >> "state": "booting", >> "oldest_map": 25243, >> "newest_map": 26612, >> "num_pgs": 0 >> } >> >> [root@ceph00 ceph]# ceph osd tree >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -23 1.43999 root ssd >> -19 0 host ceph00_ssd >> -20 0.48000 host ceph01_ssd >> 40 0.48000 osd.40 up 1.00000 1.00000 >> -21 0.48000 host ceph02_ssd >> 43 0.48000 osd.43 up 1.00000 1.00000 >> -22 0.48000 host ceph03_ssd >> 41 0.48000 osd.41 up 1.00000 1.00000 >> -1 120.00000 root default >> -17 80.00000 room b1 >> -14 40.00000 host ceph01 >> 1 4.00000 osd.1 up 1.00000 1.00000 >> 4 4.00000 osd.4 up 1.00000 1.00000 >> 18 4.00000 osd.18 up 1.00000 1.00000 >> 19 4.00000 osd.19 up 1.00000 1.00000 >> 20 4.00000 osd.20 up 1.00000 1.00000 >> 21 4.00000 osd.21 up 1.00000 1.00000 >> 22 4.00000 osd.22 up 1.00000 1.00000 >> 23 4.00000 osd.23 up 1.00000 1.00000 >> 24 4.00000 osd.24 up 1.00000 1.00000 >> 25 4.00000 osd.25 up 1.00000 1.00000 >> -16 40.00000 host ceph03 >> 30 4.00000 osd.30 up 1.00000 1.00000 >> 31 4.00000 osd.31 up 1.00000 1.00000 >> 32 4.00000 osd.32 up 1.00000 1.00000 >> 33 4.00000 osd.33 up 1.00000 1.00000 >> 34 4.00000 osd.34 up 1.00000 1.00000 >> 35 4.00000 osd.35 up 1.00000 1.00000 >> 36 4.00000 osd.36 up 1.00000 1.00000 >> 37 4.00000 osd.37 up 1.00000 1.00000 >> 38 4.00000 osd.38 up 1.00000 1.00000 >> 39 4.00000 osd.39 up 1.00000 1.00000 >> -18 40.00000 room b2 >> -13 0 host ceph00 >> -15 40.00000 host ceph02 >> 2 4.00000 osd.2 up 1.00000 1.00000 >> 5 4.00000 osd.5 up 1.00000 1.00000 >> 14 4.00000 osd.14 up 1.00000 1.00000 >> 15 4.00000 osd.15 up 1.00000 1.00000 >> 16 4.00000 osd.16 up 1.00000 1.00000 >> 17 4.00000 osd.17 up 1.00000 1.00000 >> 26 4.00000 osd.26 up 1.00000 1.00000 >> 27 4.00000 osd.27 up 1.00000 1.00000 >> 28 4.00000 osd.28 up 1.00000 1.00000 >> 29 4.00000 osd.29 up 1.00000 1.00000 >> 0 0 osd.0 down 0 1.00000 >> 3 0 osd.3 down 0 1.00000 >> 6 0 osd.6 down 0 1.00000 >> 7 0 osd.7 down 0 1.00000 >> 8 0 osd.8 down 0 1.00000 >> 9 0 osd.9 down 0 1.00000 >> 10 0 osd.10 down 0 1.00000 >> 11 0 osd.11 down 0 1.00000 >> 12 0 osd.12 down 0 1.00000 >> 13 0 osd.13 down 0 1.00000 >> >> >> Any assistance is greatly appreciated. >> >> Bob >> >> >> ------------------------------ >> >> This E-mail and any of its attachments may contain Time Warner Cable >> proprietary information, which is privileged, confidential, or subject to >> copyright belonging to Time Warner Cable. This E-mail is intended solely >> for the use of the individual or entity to which it is addressed. If you >> are not the intended recipient of this E-mail, you are hereby notified that >> any dissemination, distribution, copying, or action taken in relation to >> the contents of and attachments to this E-mail is strictly prohibited and >> may be unlawful. If you have received this E-mail in error, please notify >> the sender immediately and permanently delete the original and any copy of >> this E-mail and any printout. >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com