Hi Igor, please find the the startup log under the following link: https://we.tl/t-E6CadpW1ZL It also includes the “normal" log of that OSD from the the day before the crash and the RocksDB sst file with the “Bad table magic number” (db/001922.sst)
Best regards, Sebastian > On 21.02.2022, at 12:18, Igor Fedotov <igor.fedo...@croit.io> wrote: > > Hi Sebastian, > > could you please share failing OSD startup log? > > > Thanks, > > Igor > > On 2/20/2022 5:10 PM, Sebastian Mazza wrote: >> Hi Igor, >> >> it happened again. One of the OSDs that crashed last time, has a corrupted >> RocksDB again. Unfortunately I do not have debug logs from the OSDs again. I >> was collecting hundreds of Gigabytes of OSD debug logs in the last two >> month. But this week, I disabled the debug logging, because I did some tests >> with rsync to cephFS and RBD Images on EC pools and the logs did fill up my >> boot drives multiple times. >> The corruption happened after I did shut down all 3 nodes and booted it some >> minutes later. >> >> If you are interested, I could share the normal log of the OSD. A log of a >> failed OSD start with debug logging enabled and als the corrupted RocksDB >> export. >> >> It is may be worth taking a note that no crash did happen after hundreds of >> reboots but now it happens after I gracefully shut down all nodes for around >> 10 minutes. >> Best to my knowledge there was no IO on the crashed OSD for several hours. >> The crashed OSD was used by only two pools. Both are EC pools. One is used >> as data part for RBD image and on as data storage for a subdirectory of a >> cephFS. All metadata for the cephFS and the RBD pool are stored on >> replicated NVMEs. >> On RBD image on the HDD EC pool was mounted by a VM, but not as boot drive. >> The cephFS was mounted also by this VM and the 3 cluster nodes itself. Apart >> from mounting/unmounting, neither the cephFS nor the BTRFS on the RBD image >> was asked to process any IOs. So nobody was reading or writing to the failed >> OSD for many hours before the shutdown of the cluster and OSD failing >> happened. >> >> >> I’m now thinking of how I could add more storage space for the log files to >> each node, so that I can leave on the debug logging all the time. >> >> >> Best regards, >> Sebastian > > -- > Igor Fedotov > Ceph Lead Developer > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io