Re: [ceph-users] Ceph OSD crash starting up

Gonzalo Aguilar Delgado Thu, 14 Sep 2017 13:40:03 -0700

Hello,

I was on a old version of ceph. And it showed a warning saying:


/crush map/ has straw_calc_version=/0/

I rode that adjusting it will only rebalance all so admin should selectwhen to do it. So I went straigth and ran:



ceph osd crush tunables optimal

/

/It rebalanced as it said but then I started to have lots of pg wrong. Idiscovered that it was because my OSD1. I thought it was disk faillureso I added a new OSD6 and system started to rebalance. Anyway OSD wasnot starting.

I thought to wipe it all. But I preferred to leave disk as it was, andjournal intact, in case I can recover and get data from it. (See mail:[ceph-users] Scrub failing all the time, new inconsistencies keepappearing).



So here's the information. But it has OSD1 replaced by OSD3, sorry.

ID WEIGHT  REWEIGHT SIZE  USE   AVAIL %USE  VAR PGS
 0 1.00000  1.00000  926G  271G  654G 29.34 1.10 369
 2 1.00000  1.00000  460G  284G  176G 61.67 2.32 395
 4 1.00000  1.00000  465G  151G  313G 32.64 1.23 214
 3 1.36380  1.00000 1396G  239G 1157G 17.13 0.64 340
 6 0.90919  1.00000  931G  164G  766G 17.70 0.67 210
              TOTAL 4179G 1111G 3067G 26.60
MIN/MAX VAR: 0.64/2.32  STDDEV: 16.99

As I said I still have OSD1 intact so I can do whatever you need exceptreadding to the cluster. Since I don't know what It will do, maybe causehavok.

Best regards,

On 14/09/17 17:12, David Turner wrote:

What do you mean by "updated crush map to 1"? Can you please providea copy of your crush map and `ceph osd df`?

On Wed, Sep 13, 2017 at 6:39 AM Gonzalo Aguilar Delgado<gagui...@aguilardelgado.com <mailto:gagui...@aguilardelgado.com>> wrote:


    Hi,

    I'recently updated crush map to 1 and did all relocation of the
    pgs. At the end I found that one of the OSD is not starting.

    This is what it shows:


    2017-09-13 10:37:34.287248 7f49cbe12700 -1 *** Caught signal
    (Aborted) **
     in thread 7f49cbe12700 thread_name:filestore_sync

     ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
     1: (()+0x9616ee) [0xa93c6ef6ee]
     2: (()+0x11390) [0x7f49d9937390]
     3: (gsignal()+0x38) [0x7f49d78d3428]
     4: (abort()+0x16a) [0x7f49d78d502a]
     5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
    const*)+0x26b) [0xa93c7ef43b]
     6: (FileStore::sync_entry()+0x2bbb) [0xa93c47fcbb]
     7: (FileStore::SyncThread::entry()+0xd) [0xa93c4adcdd]
     8: (()+0x76ba) [0x7f49d992d6ba]
     9: (clone()+0x6d) [0x7f49d79a53dd]
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is
    needed to interpret this.

    --- begin dump of recent events ---
        -3> 2017-09-13 10:37:34.253808 7f49dac6e8c0  5 osd.1 pg_epoch:
    6293 pg[10.8c( v 6220'575937 (4942'572901,6220'575937]
    local-les=6235 n=282 ec=419 les/c/f 6235/6235/0 6293/6293/6290)
    [1,2]/[2] r=-1 lpr=0 pi=6234-6292/24 crt=6220'575937 lcod 0'0
    inactive NOTIFY NIBBLEWISE] exit Initial 0.029683 0 0.000000
        -2> 2017-09-13 10:37:34.253848 7f49dac6e8c0  5 osd.1 pg_epoch:
    6293 pg[10.8c( v 6220'575937 (4942'572901,6220'575937]
    local-les=6235 n=282 ec=419 les/c/f 6235/6235/0 6293/6293/6290)
    [1,2]/[2] r=-1 lpr=0 pi=6234-6292/24 crt=6220'575937 lcod 0'0
    inactive NOTIFY NIBBLEWISE] enter Reset
        -1> 2017-09-13 10:37:34.255018 7f49dac6e8c0  5 osd.1 pg_epoch:
    6293 pg[10.90(unlocked)] enter Initial
         0> 2017-09-13 10:37:34.287248 7f49cbe12700 -1 *** Caught
    signal (Aborted) **
     in thread 7f49cbe12700 thread_name:filestore_sync

     ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
     1: (()+0x9616ee) [0xa93c6ef6ee]
     2: (()+0x11390) [0x7f49d9937390]
     3: (gsignal()+0x38) [0x7f49d78d3428]
     4: (abort()+0x16a) [0x7f49d78d502a]
     5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
    const*)+0x26b) [0xa93c7ef43b]
     6: (FileStore::sync_entry()+0x2bbb) [0xa93c47fcbb]
     7: (FileStore::SyncThread::entry()+0xd) [0xa93c4adcdd]
     8: (()+0x76ba) [0x7f49d992d6ba]
     9: (clone()+0x6d) [0x7f49d79a53dd]
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is
    needed to interpret this.

    --- logging levels ---
       0/ 5 none
       0/ 1 lockdep
       0/ 1 context
       1/ 1 crush
       1/ 5 mds
       1/ 5 mds_balancer
       1/ 5 mds_locker
       1/ 5 mds_log
       1/ 5 mds_log_expire
       1/ 5 mds_migrator
       0/ 1 buffer
       0/ 1 timer
       0/ 1 filer
       0/ 1 striper
       0/ 1 objecter
       0/ 5 rados
       0/ 5 rbd
       0/ 5 rbd_mirror
       0/ 5 rbd_replay
       0/ 5 journaler
       0/ 5 objectcacher
       0/ 5 client
       0/ 5 osd
       0/ 5 optracker
       0/ 5 objclass
       1/ 3 filestore
       1/ 3 journal
       0/ 5 ms
       1/ 5 mon
       0/10 monc
       1/ 5 paxos
       0/ 5 tp
       1/ 5 auth
       1/ 5 crypto
       1/ 1 finisher
       1/ 5 heartbeatmap
       1/ 5 perfcounter
       1/ 5 rgw
       1/10 civetweb
       1/ 5 javaclient
       1/ 5 asok
       1/ 1 throttle
       0/ 0 refs
       1/ 5 xio
       1/ 5 compressor
       1/ 5 newstore
       1/ 5 bluestore
       1/ 5 bluefs
       1/ 3 bdev
       1/ 5 kstore
       4/ 5 rocksdb
       4/ 5 leveldb
       1/ 5 kinetic
       1/ 5 fuse
      -2/-2 (syslog threshold)
      -1/-1 (stderr threshold)
      max_recent     10000
      max_new         1000
      log_file /var/log/ceph/ceph-osd.1.log
    --- end dump of recent events ---



    Is there any way to recover it or should I open a bug?


    Best regards

    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph OSD crash starting up

Reply via email to