Bryan,

I rebooted another host which wasn't updated to CentOS 7.2 and those OSDs
also failed to come out of booting state. I thought I'd restarted each OSD
host after upgrading them to infernalis but I must have been mistaken and
after running ceph tell osd.* version I saw we were on a mix of v0.94.1,
v0.94.2, v0.94.4, and v0.94.5. I've downgraded the two hosts we were having
problems with to hammer v0.94.5 and once the cluster is happy again we will
try upgrading again.

Good luck.

Bob

On Fri, Dec 18, 2015 at 3:21 PM, Stillwell, Bryan <
bryan.stillw...@twcable.com> wrote:

> I ran into a similar problem while in the middle of upgrading from Hammer
> (0.94.5) to Infernalis (9.2.0).  I decided to try rebuilding one of the
> OSDs by using 'ceph-disk prepare /dev/sdb' and it never comes up:
>
> root@b3:~# ceph daemon osd.10 status
> {
>     "cluster_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
>     "osd_fsid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
>     "whoami": 10,
>     "state": "booting",
>     "oldest_map": 25804,
>     "newest_map": 25904,
>     "num_pgs": 0
> }
>
> Here's what is written to /var/log/ceph/osd/ceph-osd.10.log:
>
> 2015-12-18 16:09:48.928462 7fd5e2bec940  0 ceph version 9.2.0
> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 6866
> 2015-12-18 16:09:48.931387 7fd5e2bec940  1
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs in /var/lib/ceph/tmp/mnt.IOnlxY
> 2015-12-18 16:09:48.931417 7fd5e2bec940  1
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs fsid is already set to
> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
> 2015-12-18 16:09:48.931422 7fd5e2bec940  1
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) write_version_stamp 4
> 2015-12-18 16:09:48.932671 7fd5e2bec940  0
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342)
> 2015-12-18 16:09:48.934953 7fd5e2bec940  1
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) leveldb db exists/created
> 2015-12-18 16:09:48.935082 7fd5e2bec940  1 journal _open
> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2015-12-18 16:09:48.935218 7fd5e2bec940 -1 journal check: ondisk fsid
> 00000000-0000-0000-0000-000000000000 doesn't match
> expected xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, invalid (someone else's?)
> journal
> 2015-12-18 16:09:48.935227 7fd5e2bec940  1 journal close
> /var/lib/ceph/tmp/mnt.IOnlxY/journal
> 2015-12-18 16:09:48.935452 7fd5e2bec940  1 journal _open
> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2015-12-18 16:09:48.935771 7fd5e2bec940  0
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkjournal created journal on
> /var/lib/ceph/tmp/mnt.IOnlxY/journal
> 2015-12-18 16:09:48.935803 7fd5e2bec940  1
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs done in
> /var/lib/ceph/tmp/mnt.IOnlxY
> 2015-12-18 16:09:48.935919 7fd5e2bec940  0
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342)
> 2015-12-18 16:09:48.936548 7fd5e2bec940  0
> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2015-12-18 16:09:48.936559 7fd5e2bec940  0
> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features:
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2015-12-18 16:09:48.936588 7fd5e2bec940  0
> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features:
> splice is supported
> 2015-12-18 16:09:48.938319 7fd5e2bec940  0
> genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2015-12-18 16:09:48.938394 7fd5e2bec940  0
> xfsfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: extsize
> is supported and your kernel >= 3.5
> 2015-12-18 16:09:48.940420 7fd5e2bec940  0
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2015-12-18 16:09:48.940646 7fd5e2bec940  1 journal _open
> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2015-12-18 16:09:48.940865 7fd5e2bec940  1 journal _open
> /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2015-12-18 16:09:48.941270 7fd5e2bec940  1
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) upgrade
> 2015-12-18 16:09:48.941389 7fd5e2bec940 -1
> filestore(/var/lib/ceph/tmp/mnt.IOnlxY) could not find
> -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
> 2015-12-18 16:09:48.945392 7fd5e2bec940  1 journal close
> /var/lib/ceph/tmp/mnt.IOnlxY/journal
> 2015-12-18 16:09:48.946175 7fd5e2bec940 -1 created object store
> /var/lib/ceph/tmp/mnt.IOnlxY journal /var/lib/ceph/tmp/mnt.IOnlxY/journal
> for osd.10 fsid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
> 2015-12-18 16:09:48.946269 7fd5e2bec940 -1 auth: error reading file:
> /var/lib/ceph/tmp/mnt.IOnlxY/keyring: can't open
> /var/lib/ceph/tmp/mnt.IOnlxY/keyring: (2) No such file or directory
> 2015-12-18 16:09:48.946623 7fd5e2bec940 -1 created new key in keyring
> /var/lib/ceph/tmp/mnt.IOnlxY/keyring
> 2015-12-18 16:09:50.698753 7fb5db130940  0 ceph version 9.2.0
> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 7045
> 2015-12-18 16:09:50.745427 7fb5db130940  0
> filestore(/var/lib/ceph/osd/ceph-10) backend xfs (magic 0x58465342)
> 2015-12-18 16:09:50.745978 7fb5db130940  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2015-12-18 16:09:50.745987 7fb5db130940  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features:
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2015-12-18 16:09:50.746012 7fb5db130940  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: splice
> is supported
> 2015-12-18 16:09:50.746517 7fb5db130940  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2015-12-18 16:09:50.746616 7fb5db130940  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: extsize is
> supported and your kernel >= 3.5
> 2015-12-18 16:09:50.748775 7fb5db130940  0
> filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2015-12-18 16:09:50.749005 7fb5db130940  1 journal _open
> /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096
> bytes, directio = 1, aio = 1
> 2015-12-18 16:09:50.749256 7fb5db130940  1 journal _open
> /var/lib/ceph/osd/ceph-10/journal fd 19: 1072693248 bytes, block size 4096
> bytes, directio = 1, aio = 1
> 2015-12-18 16:09:50.749632 7fb5db130940  1
> filestore(/var/lib/ceph/osd/ceph-10) upgrade
> 2015-12-18 16:09:50.783188 7fb5db130940  0 <cls>
> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan
> 2015-12-18 16:09:50.851735 7fb5db130940  0 <cls>
> cls/hello/cls_hello.cc:305: loading cls_hello
> 2015-12-18 16:09:50.851807 7fb5db130940  0 osd.10 0 crush map has features
> 33816576, adjusting msgr requires for clients
> 2015-12-18 16:09:50.851818 7fb5db130940  0 osd.10 0 crush map has features
> 33816576 was 8705, adjusting msgr requires for mons
> 2015-12-18 16:09:50.851821 7fb5db130940  0 osd.10 0 crush map has features
> 33816576, adjusting msgr requires for osds
> 2015-12-18 16:09:50.851965 7fb5db130940  0 osd.10 0 load_pgs
> 2015-12-18 16:09:50.851988 7fb5db130940  0 osd.10 0 load_pgs opened 0 pgs
> 2015-12-18 16:09:50.852822 7fb5db130940 -1 osd.10 0 log_to_monitors
> {default=true}
> 2015-12-18 16:09:50.870133 7fb5c7f39700  0 osd.10 0 ignoring osdmap until
> we have initialized
> 2015-12-18 16:09:50.870409 7fb5db130940  0 osd.10 0 done with init,
> starting boot process
> 2015-12-18 16:09:50.873357 7fb5c7f39700  0 osd.10 25804 crush map has
> features 104186773504, adjusting msgr requires for clients
> 2015-12-18 16:09:50.873368 7fb5c7f39700  0 osd.10 25804 crush map has
> features 379064680448 was 33825281, adjusting msgr requires for mons
> 2015-12-18 16:09:50.873374 7fb5c7f39700  0 osd.10 25804 crush map has
> features 379064680448, adjusting msgr requires for osds
> 2015-12-18 16:09:50.873377 7fb5c7f39700  0 osd.10 25804
> check_osdmap_features enabling on-disk ERASURE CODES compat feature
> 2015-12-18 16:09:50.876187 7fb5c7f39700  0 log_channel(cluster) log [WRN]
> : failed to encode map e25805 with expected crc
> 2015-12-18 16:09:50.879534 7fb5c7f39700  0 log_channel(cluster) log [WRN]
> : failed to encode map e25805 with expected crc
> 2015-12-18 16:09:50.950405 7fb5c7f39700  0 log_channel(cluster) log [WRN]
> : failed to encode map e25905 with expected crc
> 2015-12-18 16:09:50.983355 7fb5c7f39700  0 log_channel(cluster) log [WRN]
> : failed to encode map e25905 with expected crc
>
> I'm running this on Ubuntu 14.04.3 with the linux-image-generic-lts-wily
> kernel (4.2.0-21.25~14.04.1).
>
> Are you running a mixed cluster right now too?  For example this is my
> cluster right now:
>
> root@b1:~# ceph tell osd.* version | grep version | uniq -c
> osd.10: Error ENXIO: problem getting command descriptions from osd.10
> osd.10: problem getting command descriptions from osd.10
>      11     "version": "ceph version 9.2.0
> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)"
>      15     "version": "ceph version 0.94.5
> (9764da52395923e0b32908d83a9f7304401fee43)"
>
> Bryan
>
> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Bob R <
> b...@drinksbeer.org>
> Date: Wednesday, December 16, 2015 at 11:45 AM
> To: ceph-users <ceph-users@lists.ceph.com>
> Subject: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and
> ceph infernalis 9.2.0
>
> We've been operating a cluster relatively incident free since 0.86. On
> Monday I did a yum update on one node, ceph00, and after rebooting we're
> seeing every OSD stuck in 'booting' state. I've tried removing all of the
> OSDs and recreating them with ceph-deploy (ceph-disk required modification
> to use partx -a rather than partprobe) but we see the same status. I'm not
> sure how to troubleshoot this further. Our OSDs on this host are now
> running as the ceph user which may be related to the issue as the other
> three hosts are running as root (although I followed the steps listed to
> upgrade from hammer to infernalis and did chown -R ceph:ceph /var/lib/ceph
> on each node).
>
> [root@ceph00 ceph]# lsb_release -idrc
> Distributor ID: CentOS
> Description:    CentOS Linux release 7.2.1511 (Core)
> Release:        7.2.1511
> Codename:       Core
>
> [root@ceph00 ceph]# ceph --version
> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>
> [root@ceph00 ceph]# ceph daemon osd.0 status
> {
>     "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e",
>     "osd_fsid": "ddf659ad-a3db-4094-b4d0-7d50f34b8f75",
>     "whoami": 0,
>     "state": "booting",
>     "oldest_map": 25243,
>     "newest_map": 26610,
>     "num_pgs": 0
> }
>
> [root@ceph00 ceph]# ceph daemon osd.3 status
> {
>     "cluster_fsid": "2e4ea2c0-fb62-41fa-b7b7-e34d759b851e",
>     "osd_fsid": "8b1acd8a-645d-4dc2-8c1d-6dbb1715265f",
>     "whoami": 3,
>     "state": "booting",
>     "oldest_map": 25243,
>     "newest_map": 26612,
>     "num_pgs": 0
> }
>
> [root@ceph00 ceph]# ceph osd tree
> ID  WEIGHT    TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -23   1.43999 root ssd
> -19         0     host ceph00_ssd
> -20   0.48000     host ceph01_ssd
>  40   0.48000         osd.40           up  1.00000          1.00000
> -21   0.48000     host ceph02_ssd
>  43   0.48000         osd.43           up  1.00000          1.00000
> -22   0.48000     host ceph03_ssd
>  41   0.48000         osd.41           up  1.00000          1.00000
>  -1 120.00000 root default
> -17  80.00000     room b1
> -14  40.00000         host ceph01
>   1   4.00000             osd.1        up  1.00000          1.00000
>   4   4.00000             osd.4        up  1.00000          1.00000
>  18   4.00000             osd.18       up  1.00000          1.00000
>  19   4.00000             osd.19       up  1.00000          1.00000
>  20   4.00000             osd.20       up  1.00000          1.00000
>  21   4.00000             osd.21       up  1.00000          1.00000
>  22   4.00000             osd.22       up  1.00000          1.00000
>  23   4.00000             osd.23       up  1.00000          1.00000
>  24   4.00000             osd.24       up  1.00000          1.00000
>  25   4.00000             osd.25       up  1.00000          1.00000
> -16  40.00000         host ceph03
>  30   4.00000             osd.30       up  1.00000          1.00000
>  31   4.00000             osd.31       up  1.00000          1.00000
>  32   4.00000             osd.32       up  1.00000          1.00000
>  33   4.00000             osd.33       up  1.00000          1.00000
>  34   4.00000             osd.34       up  1.00000          1.00000
>  35   4.00000             osd.35       up  1.00000          1.00000
>  36   4.00000             osd.36       up  1.00000          1.00000
>  37   4.00000             osd.37       up  1.00000          1.00000
>  38   4.00000             osd.38       up  1.00000          1.00000
>  39   4.00000             osd.39       up  1.00000          1.00000
> -18  40.00000     room b2
> -13         0         host ceph00
> -15  40.00000         host ceph02
>   2   4.00000             osd.2        up  1.00000          1.00000
>   5   4.00000             osd.5        up  1.00000          1.00000
>  14   4.00000             osd.14       up  1.00000          1.00000
>  15   4.00000             osd.15       up  1.00000          1.00000
>  16   4.00000             osd.16       up  1.00000          1.00000
>  17   4.00000             osd.17       up  1.00000          1.00000
>  26   4.00000             osd.26       up  1.00000          1.00000
>  27   4.00000             osd.27       up  1.00000          1.00000
>  28   4.00000             osd.28       up  1.00000          1.00000
>  29   4.00000             osd.29       up  1.00000          1.00000
>   0         0 osd.0                  down        0          1.00000
>   3         0 osd.3                  down        0          1.00000
>   6         0 osd.6                  down        0          1.00000
>   7         0 osd.7                  down        0          1.00000
>   8         0 osd.8                  down        0          1.00000
>   9         0 osd.9                  down        0          1.00000
>  10         0 osd.10                 down        0          1.00000
>  11         0 osd.11                 down        0          1.00000
>  12         0 osd.12                 down        0          1.00000
>  13         0 osd.13                 down        0          1.00000
>
>
> Any assistance is greatly appreciated.
>
> Bob
>
>
> ------------------------------
>
> This E-mail and any of its attachments may contain Time Warner Cable
> proprietary information, which is privileged, confidential, or subject to
> copyright belonging to Time Warner Cable. This E-mail is intended solely
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient of this E-mail, you are hereby notified that
> any dissemination, distribution, copying, or action taken in relation to
> the contents of and attachments to this E-mail is strictly prohibited and
> may be unlawful. If you have received this E-mail in error, please notify
> the sender immediately and permanently delete the original and any copy of
> this E-mail and any printout.
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to