Hi!

originally your issue looked like the ones from https://tracker.ceph.com/issues/42223
And it looks like lack of some key information for FreeListManager in 
RocksDB.
Once you have it present we can check the content of the RocksDB to 
prove this hypothesis, please let me know if you want the guideline for 
that.

The last log is different, the key record is probably:

-2> 2019-10-09 23:03:47.011 7fb4295a7700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: expected 2181709173, got 2130853119  in db/204514.sst offset 0 size 61648 code = 2 Rocksdb transaction:
which most probably denotes data corruption in DB. Unfortunately for now 
I can't say if this is related to the original issue or not.
This time it reminds the issue shared in this mailing list a while ago 
by Stefan Priebe. The post caption is "Bluestore OSDs keep crashing in 
BlueStore.cc: 8808: FAILED assert(r == 0)"
So first of all I'd suggest to distinguish these issues for now and try 
to troubleshoot them separately.

As for the first case I'm wondering if you have any OSDs still failing this way, i.e. asserting in allocator and showing 0 extents loaded: "_open_alloc loaded 0 B in 0 extents"
If so lets check DB content first.


For the second case I'm wondering the most if the issue is permanent for a specific OSD or it disappears after OSD/node restart as it occurred in Stefan's case?

Thanks,

Igor


On 10/10/2019 1:59 PM, cephuser2345 user wrote:
Hi igor
since the last osd crash we had some 4 more  tried to check RocksDB with ceph-kvstore-tool :
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-71 compact
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-71 repair
ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-71 destructive-repair
nothing helped  we had  to redeploy the osd by removing it from the 
cluster and reinstalling
we have updated  to ceph  14.2.4   2 weeks or more ago still osd's 
falling in the same way
i have manged to to  capture the first fault  by using : ceph crash ls 
added the log+meta  to this email
can something dose this logs can shed some light ?










    On Thu, Sep 12, 2019 at 7:20 PM Igor Fedotov <ifedo...@suse.de
    <mailto:ifedo...@suse.de>> wrote:

        Hi,

        this line:

            -2> 2019-09-12 16:38:15.101 7fcd02fd1f80  1
        bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc loaded 0 B in
        0 extents

        tells me that OSD is unable to load free list manager
        properly, i.e. list of free/allocated blocks in unavailable.

        You might want to set 'debug bluestore = 10" and check
        additional log output between

        these two lines:

            -3> 2019-09-12 16:38:15.093 7fcd02fd1f80  1
        bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc opening
        allocation metadata
            -2> 2019-09-12 16:38:15.101 7fcd02fd1f80  1
        bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc loaded 0 B in
        0 extents

        And/or check RocksDB records prefixed with "b" prefix using
        ceph-kvstore-tool.


        Igor


        P.S.

        Sorry, might be unresponsive for the next two week as I'm
        going on vacation.


        On 9/12/2019 7:04 PM, cephuser2345 user wrote:
        Hi
        we have updated  the ceph version from 14.2.2 to version 14.2.3.
        the osd getting :

          -21        76.68713     host osd048
         66   hdd  12.78119         osd.66      up  1.00000 1.00000
         67   hdd  12.78119         osd.67      up  1.00000 1.00000
         68   hdd  12.78119         osd.68      up  1.00000 1.00000
         69   hdd  12.78119         osd.69      up  1.00000 1.00000
         70   hdd  12.78119         osd.70      up  1.00000 1.00000
         71   hdd  12.78119         osd.71    down  0 1.00000

        we can not   get the osd  up  getting error its happening in
        alot of osds
        can you please assist :)  added txt log
        bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc opening
        allocation metadata
            -2> 2019-09-12 16:38:15.101 7fcd02fd1f80  1
        bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc loaded 0 B
        in 0 extents
            -1> 2019-09-12 16:38:15.101 7fcd02fd1f80 -1
        /build/ceph-14.2.3/src/os/bluestore/fastbmap_allocator_impl.h:
        In function 'void
        AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t)
        [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned
        int]' thread 7fcd02fd1f80 time 2019-09-12 16:38:15.102539

        _______________________________________________
        ceph-users mailing list
        ceph-users@lists.ceph.com  <mailto:ceph-users@lists.ceph.com>
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to