I think I may have experienced something similar after upgrading to Infernalis as well. After rebooting all the Mons and OSD nodes everything returned to normal. I wasn’t suspicious of it at the time, but seeing this has got me thinking.
I was seeing the same in the logs as you, the last line “done with init, starting boot process” And then nothing. I was also seeing the peering and activating stages take 1hr+ From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Josef Johansson Sent: 15 November 2015 22:42 To: Claes Sahlström <cl...@verymetal.com> Cc: ceph-users <ceph-users@lists.ceph.com> Subject: Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04 cc the list as well On 15 Nov 2015, at 23:41, Josef Johansson <jose...@gmail.com <mailto:jose...@gmail.com> > wrote: Hi, So it’s just frozen at that point? You should definatly increase the logging and restart the osd. I believe it’s debug osd 20 and debug mon 20. A quick google brings up a case where UUID was crashing. http://serverfault.com/questions/671372/ceph-osd-always-down-in-ubuntu-14-04-1 /Josef On 15 Nov 2015, at 23:29, Claes Sahlström <cl...@verymetal.com <mailto:cl...@verymetal.com> > wrote: Hi and thanks for helping. None that I can when scanning the logfile, it actually looks to me like it starts up just fine when I start the OSD. This is the last time I restarted it: 2015-11-15 22:58:13.445684 7f6f8f9be940 0 set uid:gid to 0:0 2015-11-15 22:58:13.445854 7f6f8f9be940 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 5463 2015-11-15 22:58:13.510385 7f6f8f9be940 0 filestore(/ceph/osd.11) backend xfs (magic 0x58465342) 2015-11-15 22:58:13.511120 7f6f8f9be940 0 genericfilestorebackend(/ceph/osd.11) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-11-15 22:58:13.511129 7f6f8f9be940 0 genericfilestorebackend(/ceph/osd.11) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2015-11-15 22:58:13.511158 7f6f8f9be940 0 genericfilestorebackend(/ceph/osd.11) detect_features: splice is supported 2015-11-15 22:58:13.515688 7f6f8f9be940 0 genericfilestorebackend(/ceph/osd.11) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-11-15 22:58:13.515934 7f6f8f9be940 0 xfsfilestorebackend(/ceph/osd.11) detect_features: extsize is supported and your kernel >= 3.5 2015-11-15 22:58:13.600801 7f6f8f9be940 0 filestore(/ceph/osd.11) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-11-15 22:58:39.150619 7f6f8f9be940 1 journal _open /dev/orange/journal-osd.11 fd 19: 23622320128 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-11-15 22:58:39.160621 7f6f8f9be940 1 journal _open /dev/orange/journal-osd.11 fd 19: 23622320128 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-11-15 22:58:39.192660 7f6f8f9be940 1 filestore(/ceph/osd.11) upgrade 2015-11-15 22:58:39.200192 7f6f8f9be940 0 <cls> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan 2015-11-15 22:58:39.200457 7f6f8f9be940 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello 2015-11-15 22:58:39.206906 7f6f8f9be940 0 osd.11 35462 crush map has features 1107558400, adjusting msgr requires for clients 2015-11-15 22:58:39.206983 7f6f8f9be940 0 osd.11 35462 crush map has features 1107558400 was 8705, adjusting msgr requires for mons 2015-11-15 22:58:39.207030 7f6f8f9be940 0 osd.11 35462 crush map has features 1107558400, adjusting msgr requires for osds 2015-11-15 22:58:40.712757 7f6f8f9be940 0 osd.11 35462 load_pgs 2015-11-15 22:59:09.980042 7f6f8f9be940 0 osd.11 35462 load_pgs opened 874 pgs 2015-11-15 22:59:09.981963 7f6f8f9be940 -1 osd.11 35462 log_to_monitors {default=true} 2015-11-15 22:59:09.990204 7f6f71312700 0 osd.11 35462 ignoring osdmap until we have initialized 2015-11-15 22:59:11.194276 7f6f8f9be940 0 osd.11 35462 done with init, starting boot process From: Josef Johansson [mailto:jose...@gmail.com] Sent: den 15 november 2015 23:10 To: Claes Sahlström <cl...@verymetal.com <mailto:cl...@verymetal.com> > Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> Subject: Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04 Hi, Could you catch any segmentation faults in /var/log/ceph/ceph-osd.11.log ? Regards, Josef On 15 Nov 2015, at 23:06, Claes Sahlström < <mailto:cl...@verymetal.com> cl...@verymetal.com> wrote: Sorry to almost double post, I noticed that it seems like one mon is down, but they do actually seem to be ok, the 11 that are in falls out and I am back at 7 healthy OSD:s again: root@black:/var/lib/ceph/mon# ceph -s cluster ee8eae7a-5994-48bc-bd43-aa07639a543b health HEALTH_WARN 108 pgs backfill 37 pgs backfilling 2339 pgs degraded 105 pgs down 237 pgs peering 138 pgs stale 765 pgs stuck degraded 173 pgs stuck inactive 138 pgs stuck stale 3327 pgs stuck unclean 765 pgs stuck undersized 2339 pgs undersized recovery 1612956/6242357 objects degraded (25.839%) recovery 772311/6242357 objects misplaced (12.372%) too many PGs per OSD (561 > max 350) 4/11 in osds are down monmap e3: 3 mons at {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0} election epoch 456, quorum 0,1,2 black,purple,orange mdsmap e5: 0/0/1 up osdmap e35627: 12 osds: 7 up, 11 in; 1201 remapped pgs pgmap v8215121: 4608 pgs, 3 pools, 11897 GB data, 2996 kobjects 17203 GB used, 8865 GB / 26069 GB avail 1612956/6242357 objects degraded (25.839%) 772311/6242357 objects misplaced (12.372%) 2137 active+undersized+degraded 1052 active+clean 783 active+remapped 137 stale+active+undersized+degraded 104 down+peering 102 active+remapped+wait_backfill 66 remapped+peering 65 peering 33 active+remapped+backfilling 27 activating+undersized+degraded 26 active+undersized+degraded+remapped 25 activating 16 remapped 14 inactive 7 activating+remapped 6 active+undersized+degraded+remapped+wait_backfill 4 active+undersized+degraded+remapped+backfilling 2 activating+undersized+degraded+remapped 1 down+remapped+peering 1 stale+remapped+peering recovery io 22108 MB/s, 5581 objects/s client io 1065 MB/s rd, 2317 MB/s wr, 11435 op/s From: ceph-users [ <mailto:ceph-users-boun...@lists.ceph.com> mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Claes Sahlström Sent: den 15 november 2015 21:56 To: <mailto:ceph-users@lists.ceph.com> ceph-users@lists.ceph.com Subject: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04 Hi, I have a problem I hope is possible to solve… I upgraded to 9.2.0 a couple of days back and I missed this part: “If your systems already have a ceph user, upgrading the package will cause problems. We suggest you first remove or rename the existing ‘ceph’ user and ‘ceph’ group before upgrading.” I guess that might be the reason why my OSD:s has started to die on me. I can get the osd-services when having the file permissions as root:root and using: setuser match path = /var/lib/ceph/$type/$cluster-$i I am really not sure where to look to find out what is wrong. First when I had upgraded and the OSD:s were restarted then I got a permission denied on the ods-directories and that was solve then adding the “setuser match” in ceph.conf. With 5 of 12 OSD:s down I am starting to worry and since I only have one replica I might lose som data. As I mentioned the OSD-services start and “ceph osd in” does not give me any error but the OSD never comes up. Any suggestions or helpful tips are most welcome, /Claes ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 24.00000 root default -2 8.00000 host black 3 2.00000 osd.3 up 1.00000 1.00000 2 2.00000 osd.2 up 1.00000 1.00000 0 2.00000 osd.0 up 1.00000 1.00000 1 2.00000 osd.1 up 1.00000 1.00000 -3 8.00000 host purple 7 2.00000 osd.7 down 0 1.00000 6 2.00000 osd.6 up 1.00000 1.00000 4 2.00000 osd.4 up 1.00000 1.00000 5 2.00000 osd.5 up 1.00000 1.00000 -4 8.00000 host orange 11 2.00000 osd.11 down 0 1.00000 10 2.00000 osd.10 down 0 1.00000 8 2.00000 osd.8 down 0 1.00000 9 2.00000 osd.9 down 0 1.00000 root@black:/var/log/ceph# ceph -s 2015-11-15 21:55:27.919339 7ffb38446700 0 -- :/1336310814 >> 172.16.0.203:6789/0 pipe(0x7ffb34064550 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7ffb3405e000).fault cluster ee8eae7a-5994-48bc-bd43-aa07639a543b health HEALTH_WARN 1591 pgs backfill 38 pgs backfilling 2439 pgs degraded 105 pgs down 106 pgs peering 138 pgs stale 2439 pgs stuck degraded 106 pgs stuck inactive 138 pgs stuck stale 2873 pgs stuck unclean 2439 pgs stuck undersized 2439 pgs undersized recovery 1694156/6668499 objects degraded (25.405%) recovery 2315800/6668499 objects misplaced (34.727%) too many PGs per OSD (1197 > max 350) 1 mons down, quorum 0,1 black,purple monmap e3: 3 mons at {black=172.16.0.201:6789/0,orange=172.16.0.203:6789/0,purple=172.16.0.202:6789/0} election epoch 448, quorum 0,1 black,purple mdsmap e5: 0/0/1 up osdmap e34098: 12 osds: 7 up, 7 in; 2024 remapped pgs pgmap v8211622: 4608 pgs, 3 pools, 12027 GB data, 3029 kobjects 17141 GB used, 8927 GB / 26069 GB avail 1694156/6668499 objects degraded (25.405%) 2315800/6668499 objects misplaced (34.727%) 1735 active+clean 1590 active+undersized+degraded+remapped+wait_backfill 637 active+undersized+degraded 326 active+remapped 137 stale+active+undersized+degraded 101 down+peering 38 active+undersized+degraded+remapped+backfilling 37 active+undersized+degraded+remapped 4 down+remapped+peering 1 stale+remapped+peering 1 active 1 active+remapped+wait_backfill recovery io 66787 kB/s, 16 objects/s _______________________________________________ ceph-users mailing list <mailto:ceph-users@lists.ceph.com> ceph-users@lists.ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com