Sam,

Thanks for taking a look. It does seem to fit my issue. Would just removing
the 5.0_head directory be appropriate or would using ceph-objectstore-tool
be better?

Thanks,
Berant

On Mon, May 18, 2015 at 1:47 PM, Samuel Just <sj...@redhat.com> wrote:

> You have most likely hit http://tracker.ceph.com/issues/11429.  There are
> some workarounds in the bugs marked as duplicates of that bug, or you can
> wait for the next hammer point release.
> -Sam
>
> ----- Original Message -----
> From: "Berant Lemmenes" <ber...@lemmenes.com>
> To: ceph-users@lists.ceph.com
> Sent: Monday, May 18, 2015 10:24:38 AM
> Subject: [ceph-users] OSD unable to start (giant -> hammer)
>
> Hello all,
>
> I've encountered a problem when upgrading my single node home cluster from
> giant to hammer, and I would greatly appreciate any insight.
>
> I upgraded the packages like normal, then proceeded to restart the mon and
> once that came back restarted the first OSD (osd.3). However it
> subsequently won't start and crashes with the following failed assertion:
>
>
>
> osd/OSD.h: 716: FAILED assert(ret)
>
> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x7f) [0xb1784f]
>
> 2: (OSD::load_pgs()+0x277b) [0x6850fb]
>
> 3: (OSD::init()+0x1448) [0x6930b8]
>
> 4: (main()+0x26b9) [0x62fd89]
>
> 5: (__libc_start_main()+0xed) [0x7f2345bc976d]
>
> 6: ceph-osd() [0x635679]
>
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
>
>
> --- logging levels ---
>
> 0/ 5 none
>
> 0/ 1 lockdep
>
> 0/ 1 context
>
> 1/ 1 crush
>
> 1/ 5 mds
>
> 1/ 5 mds_balancer
>
> 1/ 5 mds_locker
>
> 1/ 5 mds_log
>
> 1/ 5 mds_log_expire
>
> 1/ 5 mds_migrator
>
> 0/ 1 buffer
>
> 0/ 1 timer
>
> 0/ 1 filer
>
> 0/ 1 striper
>
> 0/ 1 objecter
>
> 0/ 5 rados
>
> 0/ 5 rbd
>
> 0/ 5 rbd_replay
>
> 0/ 5 journaler
>
> 0/ 5 objectcacher
>
> 0/ 5 client
>
> 0/ 5 osd
>
> 0/ 5 optracker
>
> 0/ 5 objclass
>
> 1/ 3 filestore
>
> 1/ 3 keyvaluestore
>
> 1/ 3 journal
>
> 0/ 5 ms
>
> 1/ 5 mon
>
> 0/10 monc
>
> 1/ 5 paxos
>
> 0/ 5 tp
>
> 1/ 5 auth
>
> 1/ 5 crypto
>
> 1/ 1 finisher
>
> 1/ 5 heartbeatmap
>
> 1/ 5 perfcounter
>
> 1/ 5 rgw
>
> 1/10 civetweb
>
> 1/ 5 javaclient
>
> 1/ 5 asok
>
> 1/ 1 throttle
>
> 0/ 0 refs
>
> 1/ 5 xio
>
> -2/-2 (syslog threshold)
>
> 99/99 (stderr threshold)
>
> max_recent 10000
>
> max_new 1000
>
> log_file
>
> --- end dump of recent events ---
>
> terminate called after throwing an instance of 'ceph::FailedAssertion'
>
> *** Caught signal (Aborted) **
>
> in thread 7f2347f71780
>
> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>
> 1: ceph-osd() [0xa1fe55]
>
> 2: (()+0xfcb0) [0x7f2346fb1cb0]
>
> 3: (gsignal()+0x35) [0x7f2345bde0d5]
>
> 4: (abort()+0x17b) [0x7f2345be183b]
>
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d]
>
> 6: (()+0xb5846) [0x7f234652d846]
>
> 7: (()+0xb5873) [0x7f234652d873]
>
> 8: (()+0xb596e) [0x7f234652d96e]
>
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x259) [0xb17a29]
>
> 10: (OSD::load_pgs()+0x277b) [0x6850fb]
>
> 11: (OSD::init()+0x1448) [0x6930b8]
>
> 12: (main()+0x26b9) [0x62fd89]
>
> 13: (__libc_start_main()+0xed) [0x7f2345bc976d]
>
> 14: ceph-osd() [0x635679]
>
> 2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal (Aborted) **
>
> in thread 7f2347f71780
>
>
>
>
> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>
> 1: ceph-osd() [0xa1fe55]
>
> 2: (()+0xfcb0) [0x7f2346fb1cb0]
>
> 3: (gsignal()+0x35) [0x7f2345bde0d5]
>
> 4: (abort()+0x17b) [0x7f2345be183b]
>
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d]
>
> 6: (()+0xb5846) [0x7f234652d846]
>
> 7: (()+0xb5873) [0x7f234652d873]
>
> 8: (()+0xb596e) [0x7f234652d96e]
>
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x259) [0xb17a29]
>
> 10: (OSD::load_pgs()+0x277b) [0x6850fb]
>
> 11: (OSD::init()+0x1448) [0x6930b8]
>
> 12: (main()+0x26b9) [0x62fd89]
>
> 13: (__libc_start_main()+0xed) [0x7f2345bc976d]
>
> 14: ceph-osd() [0x635679]
>
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
>
>
> --- begin dump of recent events ---
>
> 0> 2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal (Aborted)
> **
>
> in thread 7f2347f71780
>
>
>
>
> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>
> 1: ceph-osd() [0xa1fe55]
>
> 2: (()+0xfcb0) [0x7f2346fb1cb0]
>
> 3: (gsignal()+0x35) [0x7f2345bde0d5]
>
> 4: (abort()+0x17b) [0x7f2345be183b]
>
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d]
>
> 6: (()+0xb5846) [0x7f234652d846]
>
> 7: (()+0xb5873) [0x7f234652d873]
>
> 8: (()+0xb596e) [0x7f234652d96e]
>
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x259) [0xb17a29]
>
> 10: (OSD::load_pgs()+0x277b) [0x6850fb]
>
> 11: (OSD::init()+0x1448) [0x6930b8]
>
> 12: (main()+0x26b9) [0x62fd89]
>
> 13: (__libc_start_main()+0xed) [0x7f2345bc976d]
>
> 14: ceph-osd() [0x635679]
>
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
>
>
> --- logging levels ---
>
> 0/ 5 none
>
> 0/ 1 lockdep
>
> 0/ 1 context
>
> 1/ 1 crush
>
> 1/ 5 mds
>
> 1/ 5 mds_balancer
>
> 1/ 5 mds_locker
>
> 1/ 5 mds_log
>
> 1/ 5 mds_log_expire
>
> 1/ 5 mds_migrator
>
> 0/ 1 buffer
>
> 0/ 1 timer
>
> 0/ 1 filer
>
> 0/ 1 striper
>
> 0/ 1 objecter
>
> 0/ 5 rados
>
> 0/ 5 rbd
>
> 0/ 5 rbd_replay
>
> 0/ 5 journaler
>
> 0/ 5 objectcacher
>
> 0/ 5 client
>
> 0/ 5 osd
>
> 0/ 5 optracker
>
> 0/ 5 objclass
>
> 1/ 3 filestore
>
> 1/ 3 keyvaluestore
>
> 1/ 3 journal
>
> 0/ 5 ms
>
> 1/ 5 mon
>
> 0/10 monc
>
> 1/ 5 paxos
>
> 0/ 5 tp
>
> 1/ 5 auth
>
> 1/ 5 crypto
>
> 1/ 1 finisher
>
> 1/ 5 heartbeatmap
>
> 1/ 5 perfcounter
>
> 1/ 5 rgw
>
> 1/10 civetweb
>
> 1/ 5 javaclient
>
> 1/ 5 asok
>
> 1/ 1 throttle
>
> 0/ 0 refs
>
> 1/ 5 xio
>
> -2/-2 (syslog threshold)
>
> 99/99 (stderr threshold)
>
> max_recent 10000
>
> max_new 1000
>
> log_file
>
> --- end dump of recent events ---
>
>
> I've included a 'ceph osd dump' here:
> http://pastebin.com/RKbaY7nv
>
> ceph osd tree:
>
>
> ceph osd tree
>
> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>
> -1 24.14000 root default
>
> -3 0 rack unknownrack
>
> -2 0 host ceph-test
>
> -4 24.14000 host ceph01
>
> 0 1.50000 osd.0 down 0 1.00000
>
> 2 1.50000 osd.2 down 0 1.00000
>
> 3 1.50000 osd.3 down 1.00000 1.00000
>
> 5 2.00000 osd.5 up 1.00000 1.00000
>
> 6 2.00000 osd.6 up 1.00000 1.00000
>
> 7 2.00000 osd.7 up 1.00000 1.00000
>
> 8 2.00000 osd.8 up 1.00000 1.00000
>
> 9 2.00000 osd.9 up 1.00000 1.00000
>
> 10 2.00000 osd.10 up 1.00000 1.00000
>
> 4 4.00000 osd.4 up 1.00000 1.00000
>
> 1 3.64000 osd.1 up 1.00000 1.00000
>
>
>
>
> Note that osd.0 and osd.2 were down prior to the upgrade and the cluster
> was healthy (these are failed disks that have been out for some time just
> not removed from CRUSH.
>
> I've also included a log with OSD debugging set to 20 here:
>
> https://dl.dropboxusercontent.com/u/1043493/osd.3.log.gz
>
>
> Looking through that file, it appears the last pg that it loads
> successfully is 2.3f6 then it moves to 5.0
>
> -3> 2015-05-18 12:25:24.292091 7f6f407f9780 10 osd.3 39533 load_pgs loaded
> pg[2.3f6( v 39533'289849 (37945'286848,39533'289849] local-les=39532 n=99
> ec=1 les/c 39532/39532 39531/39531/39523) [5,4,3] r=2 lpr=39533
> pi=34961-39530/34 crt=39533'289846 lcod 0'0 inactive NOTIFY]
> log((37945'286848,39533'289849], crt=39533'289846)
>
> -2> 2015-05-18 12:25:24.292100 7f6f407f9780 10 osd.3 39533 pgid 5.0 coll
> 5.0_head
>
> -1> 2015-05-18 12:25:24.570188 7f6f407f9780 20 osd.3 0 get_map 34144 -
> loading and decoding 0x411fd80
>
> 0> 2015-05-18 12:26:02.758914 7f6f407f9780 -1 osd/OSD.h: In function
> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f6f407f9780 time
> 2015-05-18 12:25:24.620468
>
>
>
> osd/OSD.h: 716: FAILED assert(ret)
>
> [snip]
>
> Which I don't see 5.0 in a pg dump.
>
>
>
>
> Thanks in advance!
>
> Berant
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to