Hello all,

I've encountered a problem when upgrading my single node home cluster from
giant to hammer, and I would greatly appreciate any insight.

I upgraded the packages like normal, then proceeded to restart the mon and
once that came back restarted the first OSD (osd.3). However it
subsequently won't start and crashes with the following failed assertion:

osd/OSD.h: 716: FAILED assert(ret)

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)

 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x7f) [0xb1784f]

 2: (OSD::load_pgs()+0x277b) [0x6850fb]

 3: (OSD::init()+0x1448) [0x6930b8]

 4: (main()+0x26b9) [0x62fd89]

 5: (__libc_start_main()+0xed) [0x7f2345bc976d]

 6: ceph-osd() [0x635679]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.


--- logging levels ---

   0/ 5 none

   0/ 1 lockdep

   0/ 1 context

   1/ 1 crush

   1/ 5 mds

   1/ 5 mds_balancer

   1/ 5 mds_locker

   1/ 5 mds_log

   1/ 5 mds_log_expire

   1/ 5 mds_migrator

   0/ 1 buffer

   0/ 1 timer

   0/ 1 filer

   0/ 1 striper

   0/ 1 objecter

   0/ 5 rados

   0/ 5 rbd

   0/ 5 rbd_replay

   0/ 5 journaler

   0/ 5 objectcacher

   0/ 5 client

   0/ 5 osd

   0/ 5 optracker

   0/ 5 objclass

   1/ 3 filestore

   1/ 3 keyvaluestore

   1/ 3 journal

   0/ 5 ms

   1/ 5 mon

   0/10 monc

   1/ 5 paxos

   0/ 5 tp

   1/ 5 auth

   1/ 5 crypto

   1/ 1 finisher

   1/ 5 heartbeatmap

   1/ 5 perfcounter

   1/ 5 rgw

   1/10 civetweb

   1/ 5 javaclient

   1/ 5 asok

   1/ 1 throttle

   0/ 0 refs

   1/ 5 xio

  -2/-2 (syslog threshold)

  99/99 (stderr threshold)

  max_recent     10000

  max_new         1000

  log_file

--- end dump of recent events ---

terminate called after throwing an instance of 'ceph::FailedAssertion'

*** Caught signal (Aborted) **

 in thread 7f2347f71780

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)

 1: ceph-osd() [0xa1fe55]

 2: (()+0xfcb0) [0x7f2346fb1cb0]

 3: (gsignal()+0x35) [0x7f2345bde0d5]

 4: (abort()+0x17b) [0x7f2345be183b]

 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d]

 6: (()+0xb5846) [0x7f234652d846]

 7: (()+0xb5873) [0x7f234652d873]

 8: (()+0xb596e) [0x7f234652d96e]

 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x259) [0xb17a29]

 10: (OSD::load_pgs()+0x277b) [0x6850fb]

 11: (OSD::init()+0x1448) [0x6930b8]

 12: (main()+0x26b9) [0x62fd89]

 13: (__libc_start_main()+0xed) [0x7f2345bc976d]

 14: ceph-osd() [0x635679]

2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal (Aborted) **

 in thread 7f2347f71780


 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)

 1: ceph-osd() [0xa1fe55]

 2: (()+0xfcb0) [0x7f2346fb1cb0]

 3: (gsignal()+0x35) [0x7f2345bde0d5]

 4: (abort()+0x17b) [0x7f2345be183b]

 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d]

 6: (()+0xb5846) [0x7f234652d846]

 7: (()+0xb5873) [0x7f234652d873]

 8: (()+0xb596e) [0x7f234652d96e]

 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x259) [0xb17a29]

 10: (OSD::load_pgs()+0x277b) [0x6850fb]

 11: (OSD::init()+0x1448) [0x6930b8]

 12: (main()+0x26b9) [0x62fd89]

 13: (__libc_start_main()+0xed) [0x7f2345bc976d]

 14: ceph-osd() [0x635679]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.


--- begin dump of recent events ---

     0> 2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal
(Aborted) **

 in thread 7f2347f71780


 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)

 1: ceph-osd() [0xa1fe55]

 2: (()+0xfcb0) [0x7f2346fb1cb0]

 3: (gsignal()+0x35) [0x7f2345bde0d5]

 4: (abort()+0x17b) [0x7f2345be183b]

 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d]

 6: (()+0xb5846) [0x7f234652d846]

 7: (()+0xb5873) [0x7f234652d873]

 8: (()+0xb596e) [0x7f234652d96e]

 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x259) [0xb17a29]

 10: (OSD::load_pgs()+0x277b) [0x6850fb]

 11: (OSD::init()+0x1448) [0x6930b8]

 12: (main()+0x26b9) [0x62fd89]

 13: (__libc_start_main()+0xed) [0x7f2345bc976d]

 14: ceph-osd() [0x635679]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.


--- logging levels ---

   0/ 5 none

   0/ 1 lockdep

   0/ 1 context

   1/ 1 crush

   1/ 5 mds

   1/ 5 mds_balancer

   1/ 5 mds_locker

   1/ 5 mds_log

   1/ 5 mds_log_expire

   1/ 5 mds_migrator

   0/ 1 buffer

   0/ 1 timer

   0/ 1 filer

   0/ 1 striper

   0/ 1 objecter

   0/ 5 rados

   0/ 5 rbd

   0/ 5 rbd_replay

   0/ 5 journaler

   0/ 5 objectcacher

   0/ 5 client

   0/ 5 osd

   0/ 5 optracker

   0/ 5 objclass

   1/ 3 filestore

   1/ 3 keyvaluestore

   1/ 3 journal

   0/ 5 ms

   1/ 5 mon

   0/10 monc

   1/ 5 paxos

   0/ 5 tp

   1/ 5 auth

   1/ 5 crypto

   1/ 1 finisher

   1/ 5 heartbeatmap

   1/ 5 perfcounter

   1/ 5 rgw

   1/10 civetweb

   1/ 5 javaclient

   1/ 5 asok

   1/ 1 throttle

   0/ 0 refs

   1/ 5 xio

  -2/-2 (syslog threshold)

  99/99 (stderr threshold)

  max_recent     10000

  max_new         1000

  log_file

--- end dump of recent events ---


I've included a 'ceph osd dump' here:
http://pastebin.com/RKbaY7nv

ceph osd tree:

 ceph osd tree

ID WEIGHT   TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 24.14000 root default

-3        0     rack unknownrack

-2        0         host ceph-test

-4 24.14000     host ceph01

 0  1.50000         osd.0             down        0          1.00000

 2  1.50000         osd.2             down        0          1.00000

 3  1.50000         osd.3             down  1.00000          1.00000

 5  2.00000         osd.5               up  1.00000          1.00000

 6  2.00000         osd.6               up  1.00000          1.00000

 7  2.00000         osd.7               up  1.00000          1.00000

 8  2.00000         osd.8               up  1.00000          1.00000

 9  2.00000         osd.9               up  1.00000          1.00000

10  2.00000         osd.10              up  1.00000          1.00000

 4  4.00000         osd.4               up  1.00000          1.00000

 1  3.64000         osd.1               up  1.00000          1.00000


Note that osd.0 and osd.2 were down prior to the upgrade and the cluster
was healthy (these are failed disks that have been out for some time just
not removed from CRUSH.

I've also included a log with OSD debugging set to 20 here:

https://dl.dropboxusercontent.com/u/1043493/osd.3.log.gz

Looking through that file, it appears the last pg that it loads
successfully is 2.3f6 then it moves to 5.0

    -3> 2015-05-18 12:25:24.292091 7f6f407f9780 10 osd.3 39533 load_pgs
loaded pg[2.3f6( v 39533'289849 (37945'286848,39533'289849] local-les=39532
n=99 ec=1 les/c 39532/39532 39531/39531/39523) [5,4,3] r=2 lpr=39533
pi=34961-39530/34 crt=39533'289846 lcod 0'0 inactive NOTIFY]
log((37945'286848,39533'289849], crt=39533'289846)

    -2> 2015-05-18 12:25:24.292100 7f6f407f9780 10 osd.3 39533 pgid 5.0
coll 5.0_head

    -1> 2015-05-18 12:25:24.570188 7f6f407f9780 20 osd.3 0 get_map 34144 -
loading and decoding 0x411fd80

     0> 2015-05-18 12:26:02.758914 7f6f407f9780 -1 osd/OSD.h: In function
'OSDMapRef OSDService::get_map(epoch_t)' thread 7f6f407f9780 time
2015-05-18 12:25:24.620468

osd/OSD.h: 716: FAILED assert(ret)

[snip]

Which I don't see 5.0 in a pg dump.


Thanks in advance!

Berant
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to