Bryan,

Once the rest of the cluster was updated to v0.94.5 it now appears the one
host running infernalis v9.2.0 OSDs are booting.

Bob

On Fri, Dec 18, 2015 at 3:44 PM, Bob R <b...@drinksbeer.org> wrote:

> Bryan,
>
> I rebooted another host which wasn't updated to CentOS 7.2 and those OSDs
> also failed to come out of booting state. I thought I'd restarted each OSD
> host after upgrading them to infernalis but I must have been mistaken and
> after running ceph tell osd.* version I saw we were on a mix of v0.94.1,
> v0.94.2, v0.94.4, and v0.94.5. I've downgraded the two hosts we were having
> problems with to hammer v0.94.5 and once the cluster is happy again we will
> try upgrading again.
>
> Good luck.
>
> Bob
>
> On Fri, Dec 18, 2015 at 3:21 PM, Stillwell, Bryan <
> bryan.stillw...@twcable.com> wrote:
>
>> I ran into a similar problem while in the middle of upgrading from Hammer
>> (0.94.5) to Infernalis (9.2.0).  I decided to try rebuilding one of the
>> OSDs by using 'ceph-disk prepare /dev/sdb' and it never comes up:
>>
>> root@b3:~# ceph daemon osd.10 status
>> {
>>     "cluster_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
>>     "osd_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
>>     "whoami": 10,
>>     "state": "booting",
>>     "oldest_map": 25804,
>>     "newest_map": 25904,
>>     "num_pgs": 0
>> }
>>
>> Here's what is written to /var/log/ceph/osd/ceph-osd.10.log:
>>
>> 2015-12-18 16:09:48.928462 7fd5e2bec940  0 ceph version 9.2.0
>> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 6866
>> 2015-12-18 16:09:48.931387 7fd5e2bec940  1
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs in /var/lib/ceph/tmp/mnt.IOnlxY
>> 2015-12-18 16:09:48.931417 7fd5e2bec940  1
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs fsid is already set to
>> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
>> 2015-12-18 16:09:48.931422 7fd5e2bec940  1
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) write_version_stamp 4
>> 2015-12-18 16:09:48.932671 7fd5e2bec940  0
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342)
>> 2015-12-18 16:09:48.934953 7fd5e2bec940  1
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) leveldb db exists/created
>> 2015-12-18 16:09:48.935082 7fd5e2bec940  1 journal _open
>> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size
>> 4096 bytes, directio = 1, aio = 1
>> 2015-12-18 16:09:48.935218 7fd5e2bec940 -1 journal check: ondisk fsid
>> 00000000-0000-0000-0000-000000000000 doesn't match
>> expected xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, invalid (someone else's?)
>> journal
>> 2015-12-18 16:09:48.935227 7fd5e2bec940  1 journal close
>> /var/lib/ceph/tmp/mnt.IOnlxY/journal
>> 2015-12-18 16:09:48.935452 7fd5e2bec940  1 journal _open
>> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size
>> 4096 bytes, directio = 1, aio = 1
>> 2015-12-18 16:09:48.935771 7fd5e2bec940  0
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkjournal created journal on
>> /var/lib/ceph/tmp/mnt.IOnlxY/journal
>> 2015-12-18 16:09:48.935803 7fd5e2bec940  1
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs done in
>> /var/lib/ceph/tmp/mnt.IOnlxY
>> 2015-12-18 16:09:48.935919 7fd5e2bec940  0
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342)
>> 2015-12-18 16:09:48.936548 7fd5e2bec940  0
>> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features:
>> FIEMAP ioctl is disabled via 'filestore fiemap' config option
>> 2015-12-18 16:09:48.936559 7fd5e2bec940  0
>> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features:
>> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
>> 2015-12-18 16:09:48.936588 7fd5e2bec940  0
>> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features:
>> splice is supported
>> 2015-12-18 16:09:48.938319 7fd5e2bec940  0
>> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features:
>> syncfs(2) syscall fully supported (by glibc and kernel)
>> 2015-12-18 16:09:48.938394 7fd5e2bec940  0
>> xfsfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: extsize
>> is supported and your kernel >= 3.5
>> 2015-12-18 16:09:48.940420 7fd5e2bec940  0
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mount: enabling WRITEAHEAD journal
>> mode: checkpoint is not enabled
>> 2015-12-18 16:09:48.940646 7fd5e2bec940  1 journal _open
>> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size
>> 4096 bytes, directio = 1, aio = 1
>> 2015-12-18 16:09:48.940865 7fd5e2bec940  1 journal _open
>> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size
>> 4096 bytes, directio = 1, aio = 1
>> 2015-12-18 16:09:48.941270 7fd5e2bec940  1
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) upgrade
>> 2015-12-18 16:09:48.941389 7fd5e2bec940 -1
>> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) could not find
>> -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
>> 2015-12-18 16:09:48.945392 7fd5e2bec940  1 journal close
>> /var/lib/ceph/tmp/mnt.IOnlxY/journal
>> 2015-12-18 16:09:48.946175 7fd5e2bec940 -1 created object store
>> /var/lib/ceph/tmp/mnt.IOnlxY journal /var/lib/ceph/tmp/mnt.IOnlxY/journal
>> for osd.10 fsid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
>> 2015-12-18 16:09:48.946269 7fd5e2bec940 -1 auth: error reading file:
>> /var/lib/ceph/tmp/mnt.IOnlxY/keyring: can't open
>> /var/lib/ceph/tmp/mnt.IOnlxY/keyring: (2) No such file or directory
>> 2015-12-18 16:09:48.946623 7fd5e2bec940 -1 created new key in keyring
>> /var/lib/ceph/tmp/mnt.IOnlxY/keyring
>> 2015-12-18 16:09:50.698753 7fb5db130940  0 ceph version 9.2.0
>> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 7045
>> 2015-12-18 16:09:50.745427 7fb5db130940  0
>> filestore(/var/lib/ceph/osd/ceph-10) backend xfs (magic 0x58465342)
>> 2015-12-18 16:09:50.745978 7fb5db130940  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP
>> ioctl is disabled via 'filestore fiemap' config option
>> 2015-12-18 16:09:50.745987 7fb5db130940  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features:
>> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
>> 2015-12-18 16:09:50.746012 7fb5db130940  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: splice
>> is supported
>> 2015-12-18 16:09:50.746517 7fb5db130940  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features:
>> syncfs(2) syscall fully supported (by glibc and kernel)
>> 2015-12-18 16:09:50.746616 7fb5db130940  0
>> xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: extsize is
>> supported and your kernel >= 3.5
>> 2015-12-18 16:09:50.748775 7fb5db130940  0
>> filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal
>> mode: checkpoint is not enabled
>> 2015-12-18 16:09:50.749005 7fb5db130940  1 journal _open
>> /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096
>> bytes, directio = 1, aio = 1
>> 2015-12-18 16:09:50.749256 7fb5db130940  1 journal _open
>> /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096
>> bytes, directio = 1, aio = 1
>> 2015-12-18 16:09:50.749632 7fb5db130940  1
>> filestore(/var/lib/ceph/osd/ceph-10) upgrade
>> 2015-12-18 16:09:50.783188 7fb5db130940  0 <cls>
>> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan
>> 2015-12-18 16:09:50.851735 7fb5db130940  0 <cls>
>> cls/hello/cls_hello.cc:305: loading cls_hello
>> 2015-12-18 16:09:50.851807 7fb5db130940  0 osd.10 0 crush map has
>> features 33816576, adjusting msgr requires for clients
>> 2015-12-18 16:09:50.851818 7fb5db130940  0 osd.10 0 crush map has
>> features 33816576 was 8705, adjusting msgr requires for mons
>> 2015-12-18 16:09:50.851821 7fb5db130940  0 osd.10 0 crush map has
>> features 33816576, adjusting msgr requires for osds
>> 2015-12-18 16:09:50.851965 7fb5db130940  0 osd.10 0 load_pgs
>> 2015-12-18 16:09:50.851988 7fb5db130940  0 osd.10 0 load_pgs opened 0 pgs
>> 2015-12-18 16:09:50.852822 7fb5db130940 -1 osd.10 0 log_to_monitors
>> {default=true}
>> 2015-12-18 16:09:50.870133 7fb5c7f39700  0 osd.10 0 ignoring osdmap until
>> we have initialized
>> 2015-12-18 16:09:50.870409 7fb5db130940  0 osd.10 0 done with init,
>> starting boot process
>> 2015-12-18 16:09:50.873357 7fb5c7f39700  0 osd.10 25804 crush map has
>> features 104186773504, adjusting msgr requires for clients
>> 2015-12-18 16:09:50.873368 7fb5c7f39700  0 osd.10 25804 crush map has
>> features 379064680448 was 33825281, adjusting msgr requires for mons
>> 2015-12-18 16:09:50.873374 7fb5c7f39700  0 osd.10 25804 crush map has
>> features 379064680448, adjusting msgr requires for osds
>> 2015-12-18 16:09:50.873377 7fb5c7f39700  0 osd.10 25804
>> check_osdmap_features enabling on-disk ERASURE CODES compat feature
>> 2015-12-18 16:09:50.876187 7fb5c7f39700  0 log_channel(cluster) log [WRN]
>> : failed to encode map e25805 with expected crc
>> 2015-12-18 16:09:50.879534 7fb5c7f39700  0 log_channel(cluster) log [WRN]
>> : failed to encode map e25805 with expected crc
>> 2015-12-18 16:09:50.950405 7fb5c7f39700  0 log_channel(cluster) log [WRN]
>> : failed to encode map e25905 with expected crc
>> 2015-12-18 16:09:50.983355 7fb5c7f39700  0 log_channel(cluster) log [WRN]
>> : failed to encode map e25905 with expected crc
>>
>> I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily
>> kernel (4.2.0-21.25~14.04.1).
>>
>> Are you running a mixed cluster right now too?  For example this is my
>> cluster right now:
>>
>> root@b1:~# ceph tell osd.* version | grep version | uniq -c
>> osd.10: Error ENXIO: problem getting command descriptions from osd.10
>> osd.10: problem getting command descriptions from osd.10
>>      11     "version": "ceph version 9.2.0
>> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)"
>>      15     "version": "ceph version 0.94.5
>> (9764da52395923e0b32908d83a9f7304401fee43)"
>>
>> Bryan
>>
>> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Bob R <
>> b...@drinksbeer.org>
>> Date: Wednesday, December 16, 2015 at 11:45 AM
>> To: ceph-users <ceph-users@lists.ceph.com>
>> Subject: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and
>> ceph infernalis 9.2.0
>>
>> We've been operating a cluster relatively incident free since 0.86. On
>> Monday I did a yum update on one node, ceph00, and after rebooting we're
>> seeing every OSD stuck in 'booting' state. I've tried removing all of the
>> OSDs and recreating them with ceph-deploy (ceph-disk required modification
>> to use partx -a rather than partprobe) but we see the same status. I'm not
>> sure how to troubleshoot this further. Our OSDs on this host are now
>> running as the ceph user which may be related to the issue as the other
>> three hosts are running as root (although I followed the steps listed to
>> upgrade from hammer to infernalis and did chown -R ceph:ceph /var/lib/ceph
>> on each node).
>>
>> [root@ceph00 ceph]# lsb_release -idrc
>> Distributor ID: CentOS
>> Description:    CentOS Linux release 7.2.1511 (Core)
>> Release:        7.2.1511
>> Codename:       Core
>>
>> [root@ceph00 ceph]# ceph --version
>> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>>
>> [root@ceph00 ceph]# ceph daemon osd.0 status
>> {
>>     "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e",
>>     "osd_fsid": "ddf659ad-a3db-4094-b4d0-7d50f34b8f75",
>>     "whoami": 0,
>>     "state": "booting",
>>     "oldest_map": 25243,
>>     "newest_map": 26610,
>>     "num_pgs": 0
>> }
>>
>> [root@ceph00 ceph]# ceph daemon osd.3 status
>> {
>>     "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e",
>>     "osd_fsid": "8b1acd8a-645d-4dc2-8c1d-6dbb1715265f",
>>     "whoami": 3,
>>     "state": "booting",
>>     "oldest_map": 25243,
>>     "newest_map": 26612,
>>     "num_pgs": 0
>> }
>>
>> [root@ceph00 ceph]# ceph osd tree
>> ID  WEIGHT    TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -23   1.43999 root ssd
>> -19         0     host ceph00_ssd
>> -20   0.48000     host ceph01_ssd
>>  40   0.48000         osd.40           up  1.00000          1.00000
>> -21   0.48000     host ceph02_ssd
>>  43   0.48000         osd.43           up  1.00000          1.00000
>> -22   0.48000     host ceph03_ssd
>>  41   0.48000         osd.41           up  1.00000          1.00000
>>  -1 120.00000 root default
>> -17  80.00000     room b1
>> -14  40.00000         host ceph01
>>   1   4.00000             osd.1        up  1.00000          1.00000
>>   4   4.00000             osd.4        up  1.00000          1.00000
>>  18   4.00000             osd.18       up  1.00000          1.00000
>>  19   4.00000             osd.19       up  1.00000          1.00000
>>  20   4.00000             osd.20       up  1.00000          1.00000
>>  21   4.00000             osd.21       up  1.00000          1.00000
>>  22   4.00000             osd.22       up  1.00000          1.00000
>>  23   4.00000             osd.23       up  1.00000          1.00000
>>  24   4.00000             osd.24       up  1.00000          1.00000
>>  25   4.00000             osd.25       up  1.00000          1.00000
>> -16  40.00000         host ceph03
>>  30   4.00000             osd.30       up  1.00000          1.00000
>>  31   4.00000             osd.31       up  1.00000          1.00000
>>  32   4.00000             osd.32       up  1.00000          1.00000
>>  33   4.00000             osd.33       up  1.00000          1.00000
>>  34   4.00000             osd.34       up  1.00000          1.00000
>>  35   4.00000             osd.35       up  1.00000          1.00000
>>  36   4.00000             osd.36       up  1.00000          1.00000
>>  37   4.00000             osd.37       up  1.00000          1.00000
>>  38   4.00000             osd.38       up  1.00000          1.00000
>>  39   4.00000             osd.39       up  1.00000          1.00000
>> -18  40.00000     room b2
>> -13         0         host ceph00
>> -15  40.00000         host ceph02
>>   2   4.00000             osd.2        up  1.00000          1.00000
>>   5   4.00000             osd.5        up  1.00000          1.00000
>>  14   4.00000             osd.14       up  1.00000          1.00000
>>  15   4.00000             osd.15       up  1.00000          1.00000
>>  16   4.00000             osd.16       up  1.00000          1.00000
>>  17   4.00000             osd.17       up  1.00000          1.00000
>>  26   4.00000             osd.26       up  1.00000          1.00000
>>  27   4.00000             osd.27       up  1.00000          1.00000
>>  28   4.00000             osd.28       up  1.00000          1.00000
>>  29   4.00000             osd.29       up  1.00000          1.00000
>>   0         0 osd.0                  down        0          1.00000
>>   3         0 osd.3                  down        0          1.00000
>>   6         0 osd.6                  down        0          1.00000
>>   7         0 osd.7                  down        0          1.00000
>>   8         0 osd.8                  down        0          1.00000
>>   9         0 osd.9                  down        0          1.00000
>>  10         0 osd.10                 down        0          1.00000
>>  11         0 osd.11                 down        0          1.00000
>>  12         0 osd.12                 down        0          1.00000
>>  13         0 osd.13                 down        0          1.00000
>>
>>
>> Any assistance is greatly appreciated.
>>
>> Bob
>>
>>
>> ------------------------------
>>
>> This E-mail and any of its attachments may contain Time Warner Cable
>> proprietary information, which is privileged, confidential, or subject to
>> copyright belonging to Time Warner Cable. This E-mail is intended solely
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient of this E-mail, you are hereby notified that
>> any dissemination, distribution, copying, or action taken in relation to
>> the contents of and attachments to this E-mail is strictly prohibited and
>> may be unlawful. If you have received this E-mail in error, please notify
>> the sender immediately and permanently delete the original and any copy of
>> this E-mail and any printout.
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to