After some time 4 more OSD:s from one server dropped out and it now seems that 
only 3 OSD:s from 1 server (I have 3 servers each with 4 OSD:s) are marked as 
up the other 9 are down. I have shut the servers down for now since I will not 
have any time to work with this until the weekend.

Any suggestion of how to get the system online again are most welcome. The OSD 
disks have not crashed and I hope to be able to get them to join the cluster 
again and get the data back.

I am not sure what I did wrong when doing the upgrade from Hammer to 
Infernalis, at first I thought that it was that I didn´t remove the ceph user 
and group when upgrading, but now I have no clue, I do not think I actually had 
a ceph-user before Infernalis.

Any help or suggestions what I can try to get the system online is most welcome.


From: ceph-users [] On Behalf Of Claes 
Sent: den 16 november 2015 15:42
To: Nick Fisk <>; 'Josef Johansson' <>
Cc: 'ceph-users' <>
Subject: Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 

Tried shutting down my 3 servers and then started them again but I just got 
back to where I was yesterday with 7 working OSD:s and 5 down. Will have to 
look more into this, as long as the disks are ok and I do not erase the data on 
the OSD:s then I hope I will be able to get the system online again…


From: Nick Fisk []
Sent: den 16 november 2015 11:04
To: 'Josef Johansson' <<>>; Claes 
Sahlström <<>>
Cc: 'ceph-users' <<>>
Subject: RE: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 

I think I may have experienced something similar after upgrading to Infernalis 
as well. After rebooting all the Mons and OSD nodes everything returned to 
normal. I wasn’t suspicious of it at the time, but seeing this has got me 

I was seeing the same in the logs as you, the last line
“done with init, starting boot process”

And then nothing.

I was also seeing the peering and activating stages take 1hr+

From: ceph-users [] On Behalf Of Josef 
Sent: 15 November 2015 22:42
To: Claes Sahlström <<>>
Cc: ceph-users <<>>
Subject: Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 

cc the list as well
On 15 Nov 2015, at 23:41, Josef Johansson 
<<>> wrote:


So it’s just frozen at that point?

You should definatly increase the logging and restart the osd. I believe it’s 
debug osd 20 and debug mon 20.

A quick google brings up a case where UUID was crashing.

On 15 Nov 2015, at 23:29, Claes Sahlström 
<<>> wrote:

Hi and thanks for helping.

None that I can when scanning the logfile, it actually looks to me like it 
starts up just fine when I start the OSD. This is the last time I restarted it:

2015-11-15 22:58:13.445684 7f6f8f9be940  0 set uid:gid to 0:0
2015-11-15 22:58:13.445854 7f6f8f9be940  0 ceph version 9.2.0 
(bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 5463
2015-11-15 22:58:13.510385 7f6f8f9be940  0 filestore(/ceph/osd.11) backend xfs 
(magic 0x58465342)
2015-11-15 22:58:13.511120 7f6f8f9be940  0 
genericfilestorebackend(/ceph/osd.11) detect_features: FIEMAP ioctl is disabled 
via 'filestore fiemap' config option
2015-11-15 22:58:13.511129 7f6f8f9be940  0 
genericfilestorebackend(/ceph/osd.11) detect_features: SEEK_DATA/SEEK_HOLE is 
disabled via 'filestore seek data hole' config option
2015-11-15 22:58:13.511158 7f6f8f9be940  0 
genericfilestorebackend(/ceph/osd.11) detect_features: splice is supported
2015-11-15 22:58:13.515688 7f6f8f9be940  0 
genericfilestorebackend(/ceph/osd.11) detect_features: syncfs(2) syscall fully 
supported (by glibc and kernel)
2015-11-15 22:58:13.515934 7f6f8f9be940  0 xfsfilestorebackend(/ceph/osd.11) 
detect_features: extsize is supported and your kernel >= 3.5
2015-11-15 22:58:13.600801 7f6f8f9be940  0 filestore(/ceph/osd.11) mount: 
enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-11-15 22:58:39.150619 7f6f8f9be940  1 journal _open 
/dev/orange/journal-osd.11 fd 19: 23622320128 bytes, block size 4096 bytes, 
directio = 1, aio = 1
2015-11-15 22:58:39.160621 7f6f8f9be940  1 journal _open 
/dev/orange/journal-osd.11 fd 19: 23622320128 bytes, block size 4096 bytes, 
directio = 1, aio = 1
2015-11-15 22:58:39.192660 7f6f8f9be940  1 filestore(/ceph/osd.11) upgrade
2015-11-15 22:58:39.200192 7f6f8f9be940  0 <cls> cls/cephfs/ 
loading cephfs_size_scan
2015-11-15 22:58:39.200457 7f6f8f9be940  0 <cls> cls/hello/ 
loading cls_hello
2015-11-15 22:58:39.206906 7f6f8f9be940  0 osd.11 35462 crush map has features 
1107558400, adjusting msgr requires for clients
2015-11-15 22:58:39.206983 7f6f8f9be940  0 osd.11 35462 crush map has features 
1107558400 was 8705, adjusting msgr requires for mons
2015-11-15 22:58:39.207030 7f6f8f9be940  0 osd.11 35462 crush map has features 
1107558400, adjusting msgr requires for osds
2015-11-15 22:58:40.712757 7f6f8f9be940  0 osd.11 35462 load_pgs
2015-11-15 22:59:09.980042 7f6f8f9be940  0 osd.11 35462 load_pgs opened 874 pgs
2015-11-15 22:59:09.981963 7f6f8f9be940 -1 osd.11 35462 log_to_monitors 
2015-11-15 22:59:09.990204 7f6f71312700  0 osd.11 35462 ignoring osdmap until 
we have initialized
2015-11-15 22:59:11.194276 7f6f8f9be940  0 osd.11 35462 done with init, 
starting boot process

From: Josef Johansson []
Sent: den 15 november 2015 23:10
To: Claes Sahlström <<>>
Subject: Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 


Could you catch any segmentation faults in /var/log/ceph/ceph-osd.11.log ?


On 15 Nov 2015, at 23:06, Claes Sahlström 
<<>> wrote:

Sorry to almost double post, I noticed that it seems like one mon is down, but 
they do actually seem to be ok, the 11 that are in falls out and I am back at 7 
healthy OSD:s again:

root@black:/var/lib/ceph/mon# ceph -s
    cluster ee8eae7a-5994-48bc-bd43-aa07639a543b
     health HEALTH_WARN
            108 pgs backfill
            37 pgs backfilling
            2339 pgs degraded
            105 pgs down
            237 pgs peering
            138 pgs stale
            765 pgs stuck degraded
            173 pgs stuck inactive
            138 pgs stuck stale
            3327 pgs stuck unclean
            765 pgs stuck undersized
            2339 pgs undersized
            recovery 1612956/6242357 objects degraded (25.839%)
            recovery 772311/6242357 objects misplaced (12.372%)
            too many PGs per OSD (561 > max 350)
            4/11 in osds are down
     monmap e3: 3 mons at 
            election epoch 456, quorum 0,1,2 black,purple,orange
     mdsmap e5: 0/0/1 up
     osdmap e35627: 12 osds: 7 up, 11 in; 1201 remapped pgs
      pgmap v8215121: 4608 pgs, 3 pools, 11897 GB data, 2996 kobjects
            17203 GB used, 8865 GB / 26069 GB avail
            1612956/6242357 objects degraded (25.839%)
            772311/6242357 objects misplaced (12.372%)
                2137 active+undersized+degraded
                1052 active+clean
                 783 active+remapped
                 137 stale+active+undersized+degraded
                 104 down+peering
                 102 active+remapped+wait_backfill
                  66 remapped+peering
                  65 peering
                  33 active+remapped+backfilling
                  27 activating+undersized+degraded
                  26 active+undersized+degraded+remapped
                  25 activating
                  16 remapped
                  14 inactive
                   7 activating+remapped
                   6 active+undersized+degraded+remapped+wait_backfill
                   4 active+undersized+degraded+remapped+backfilling
                   2 activating+undersized+degraded+remapped
                   1 down+remapped+peering
                   1 stale+remapped+peering
recovery io 22108 MB/s, 5581 objects/s
  client io 1065 MB/s rd, 2317 MB/s wr, 11435 op/s

From: ceph-users [] On Behalf Of Claes 
Sent: den 15 november 2015 21:56
Subject: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04


I have a problem I hope is possible to solve…

I upgraded to 9.2.0 a couple of days back and I missed this part:
“If your systems already have a ceph user, upgrading the package will cause 
problems. We suggest you first remove or rename the existing ‘ceph’ user and 
‘ceph’ group before upgrading.”

I guess that might be the reason why my OSD:s has started to die on me.

I can get the osd-services when having the file permissions as root:root  and 
setuser match path = /var/lib/ceph/$type/$cluster-$i

I am really not sure where to look to find out what is wrong.

First when I had upgraded and the OSD:s were restarted then I got a permission 
denied on the ods-directories and that was solve then adding the “setuser 
match” in ceph.conf.

With 5 of 12 OSD:s down I am starting to worry and since I only have one 
replica I might lose som data. As I mentioned the OSD-services start and “ceph 
osd in” does not give me any error but the OSD never comes up.

Any suggestions or helpful tips are most welcome,


-1 24.00000 root default
-2  8.00000     host black
3  2.00000         osd.3        up  1.00000          1.00000
2  2.00000         osd.2        up  1.00000          1.00000
0  2.00000         osd.0        up  1.00000          1.00000
1  2.00000         osd.1        up  1.00000          1.00000
-3  8.00000     host purple
7  2.00000         osd.7      down        0          1.00000
6  2.00000         osd.6        up  1.00000          1.00000
4  2.00000         osd.4        up  1.00000          1.00000
5  2.00000         osd.5        up  1.00000          1.00000
-4  8.00000     host orange
11  2.00000         osd.11     down        0          1.00000
10  2.00000         osd.10     down        0          1.00000
8  2.00000         osd.8      down        0          1.00000
9  2.00000         osd.9      down        0          1.00000

root@black:/var/log/ceph# ceph -s
2015-11-15 21:55:27.919339 7ffb38446700  0 -- :/1336310814 >> pipe(0x7ffb34064550 sd=3 :0 s=1 pgs=0 cs=0 l=1 
    cluster ee8eae7a-5994-48bc-bd43-aa07639a543b
     health HEALTH_WARN
            1591 pgs backfill
            38 pgs backfilling
            2439 pgs degraded
            105 pgs down
            106 pgs peering
            138 pgs stale
            2439 pgs stuck degraded
            106 pgs stuck inactive
            138 pgs stuck stale
            2873 pgs stuck unclean
            2439 pgs stuck undersized
            2439 pgs undersized
            recovery 1694156/6668499 objects degraded (25.405%)
            recovery 2315800/6668499 objects misplaced (34.727%)
            too many PGs per OSD (1197 > max 350)
            1 mons down, quorum 0,1 black,purple
     monmap e3: 3 mons at 
            election epoch 448, quorum 0,1 black,purple
     mdsmap e5: 0/0/1 up
     osdmap e34098: 12 osds: 7 up, 7 in; 2024 remapped pgs
      pgmap v8211622: 4608 pgs, 3 pools, 12027 GB data, 3029 kobjects
            17141 GB used, 8927 GB / 26069 GB avail
            1694156/6668499 objects degraded (25.405%)
            2315800/6668499 objects misplaced (34.727%)
                1735 active+clean
                1590 active+undersized+degraded+remapped+wait_backfill
                 637 active+undersized+degraded
                 326 active+remapped
                 137 stale+active+undersized+degraded
                 101 down+peering
                  38 active+undersized+degraded+remapped+backfilling
                  37 active+undersized+degraded+remapped
                   4 down+remapped+peering
                   1 stale+remapped+peering
                   1 active
                   1 active+remapped+wait_backfill
recovery io 66787 kB/s, 16 objects/s
ceph-users mailing list<><>

ceph-users mailing list

Reply via email to