De : Igor Fedotov <igor.fedo...@croit.io<mailto:igor.fedo...@croit.io>>

Envoyé : jeudi 17 février 2022 16:01

À : Wissem MIMOUNA 
<wissem.mimo...@fiducialcloud.fr<mailto:wissem.mimo...@fiducialcloud.fr>>

Objet : Re: [ceph-users] OSDs crash randomnisly



Wissem, unfortunately there is no way to learn if zombies has appeared other 
than runnig fsck. But I think this can be perfomed on a weekly or even monthly 
basis - from my experience getting 32K zombies is a pretty rare case. But 
definitely ZjQcmQRYFpfptBannerStart







Wissem,



unfortunately there is no way to learn if zombies has appeared other than 
runnig fsck. But I think this can be perfomed on a weekly or even monthly basis 
- from my experience getting 32K zombies is a pretty rare case. But definitely 
it's more reliable if you collect that statistics from the cluster yourself...







Thanks,



Igor

On 2/17/2022 5:43 PM, Wissem MIMOUNA wrote:

Hi Igor,



Thank you very much  this helped us to understand the root cause and hope we 
will get a fix soon ( with new ceph release ) .

In the means time do you have any idea how to perdiocally check the zombies 
spanning blobs ( before running the fsck/repair ) soi t would be nice for us to 
automate this action ?



Have a good day

Best Regards



De : Igor Fedotov 
<igor.fedo...@croit.io<mailto:igor.fedo...@croit.io>><mailto:igor.fedo...@croit.io>

Envoyé : jeudi 17 février 2022 11:59

À : Wissem MIMOUNA 
<wissem.mimo...@fiducialcloud.fr<mailto:wissem.mimo...@fiducialcloud.fr>><mailto:wissem.mimo...@fiducialcloud.fr>;
 
ceph-users@ceph.io<mailto:ceph-users@ceph.io<mailto:ceph-users@ceph.io%3cmailto:ceph-users@ceph.io>>

Objet : Re: [ceph-users] OSDs crash randomnisly



Hi Wissem, first of all the bug wasn't fixed with the PR you're referring - it 
just added additional log output on the problem detection. Unfortunately the 
bug isn't fixed yet as the root cause for zombie spanning blobs appearance is 
still ZjQcmQRYFpfptBannerStart





Hi Wissem,







first of all the bug wasn't fixed with the PR you're referring - it just



added additional log output on the problem detection.







Unfortunately the bug isn't fixed yet as the root cause for zombie



spanning blobs appearance is still unclear.  The relevant ticket is



There is a workaround though - ceph-bluestore-tool's repair command



would detect zombie spanning blobs and remove them which should



eliminate the assertion for a while.



I'd recommend to run fsck/repair periodically as it looks like your



cluster is exposed to the problem and zombies would rather come back -



it's crucial to keep their amount below 32K per PG to avoid the assertion.







Thanks,





Igor





On 2/17/2022 1:41 PM, Wissem MIMOUNA wrote:



> Dear,



>



> Some ODSs on our ceph cluster crush with no explication .



> Stop and Start of the crushed OSD daemon fixed the issue but this happend few 
> times and I just need to understand the reason.



> For your information the error has been fixed in the log change in the 
> octopus release (



>      "process_name": "ceph-osd",



>      "entity_name": "osd.x",



>      "ceph_version": "15.2.15",



>      "utsname_hostname": "",



>      "utsname_sysname": "Linux",



>      "utsname_release": "4.15.0-162-generic",



>      "utsname_version":



>      "utsname_machine": "x86_64",



>      "os_name": "Ubuntu",



>      "os_id": "ubuntu",



>      "os_version_id": "18.04",



>      "os_version": "18.04.6 LTS (Bionic Beaver)",



>      "assert_condition": "abort",



>      "assert_func": "bid_t BlueStore::ExtentMap::allocate_spanning_blob_id()",



>      "assert_file": "/build/ceph-15.2.15/src/os/bluestore/BlueStore.cc",



>      "assert_line": 2664,



>      "assert_thread_name": "tp_osd_tp",



>      "assert_msg": "/build/ceph-15.2.15/src/os/bluestore/BlueStore.cc: In 
> function 'bid_t BlueStore::ExtentMap::allocate_spanning_blob_id()' thread 
> 7f6d37800700 time 
> 2022-02-17T09:41:55.108101+0100\n/build/ceph-15.2.15/src/os/bluestore/BlueStore.cc:
>  2664: ceph_abort_msg(\"no available blob id\")\n",



>      "backtrace": [



>          "(()+0x12980) [0x7f6d59516980]",



>          "(gsignal()+0xc7) [0x7f6d581c8fb7]",



>          "(abort()+0x141) [0x7f6d581ca921]",



>          "(ceph::__ceph_abort(char const*, int, char const*, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&)+0x1b2) [0x55ddc61f245f]",



>          "(BlueStore::ExtentMap::allocate_spanning_blob_id()+0x104) 
> [0x55ddc674b594]",



>          "(BlueStore::ExtentMap::reshard(KeyValueDB*, 
> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x1408) [0x55ddc674c9c8]",



>          "(BlueStore::_record_onode(boost::intrusive_ptr<BlueStore::Onode>&, 
> std::shared_ptr<KeyValueDB::TransactionImpl>&)+0x91c) [0x55ddc674f4ec]",



>          "(BlueStore::_txc_write_nodes(BlueStore::TransContext*, 
> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x7e) [0x55ddc6751b4e]",



>          
> "(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>  std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, 
> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2fc) 
> [0x55ddc677892c]",



>          "(non-virtual thunk to 
> PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, 
> std::allocator<ceph::os::Transaction> >&, 
> boost::intrusive_ptr<OpRequest>)+0x54) [0x55ddc63eef44]",



>          "(ECBackend::handle_sub_write(pg_shard_t, 
> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&)+0x9cd) 
> [0x55ddc65cb95d]",



>          "(ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x23d) 
> [0x55ddc65e3c2d]",



>          "(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) 
> [0x55ddc643b157]",



>          "(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, 
> ThreadPool::TPHandle&)+0x6fd) [0x55ddc63ddddd]",



>          "(OSD::dequeue_op(boost::intrusive_ptr<PG>, 
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x17b) 
> [0x55ddc62618bb]",



>          "(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, 
> boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x67) [0x55ddc64bf167]",



>          "(OSD::ShardedOpWQ::_process(unsigned int, 
> ceph::heartbeat_handle_d*)+0x90c) [0x55ddc627ef4c]",



>          "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) 
> [0x55ddc68d1d0c]",



>          "(ShardedThreadPool::WorkThreadSharded::entry()+0x10) 
> [0x55ddc68d4f60]",



>          "(()+0x76db) [0x7f6d5950b6db]",



>          "(clone()+0x3f) [0x7f6d582ab71f]"



>      ]



>



>



> Best Regards



>



>



>



>



>



>



>



> _______________________________________________



> ceph-users mailing list -- 
> ceph-users@ceph.io<mailto:ceph-users@ceph.io<mailto:ceph-users@ceph.io%3cmailto:ceph-users@ceph.io>>



> To unsubscribe send an email to 
> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io%3cmailto:ceph-users-le...@ceph.io>>







--



Igor Fedotov



Ceph Lead Developer







croit GmbH, Freseniusstr. 31h, 81247 Munich



CEO: Martin Verges - VAT-ID: DE310638492



Com. register: Amtsgericht Munich HRB 231263



--



Igor Fedotov



Ceph Lead Developer





croit GmbH, Freseniusstr. 31h, 81247 Munich



CEO: Martin Verges - VAT-ID: DE310638492



Com. register: Amtsgericht Munich HRB 231263



_______________________________________________

ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>

To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to