Hi , I got another failure and this time was able to investigate a bit. 1. If i delete the OSD and recreate it with the exact same setup, the OSD boot up successfully 2., however, diffing the log between the failed run and the successful one I noticed something odd: https://www.diffchecker.com/sSHrxwC9
We have in every successful OSD startup the following lines executed Running command: ln -snf /dev/inaugurator/ed5a15e8-20b9-4312-991c-1a4d91b284bd-wal /var/lib/ceph/osd/ceph-5/block.wal Running command: chown -h ceph:ceph /dev/inaugurator/ed5a15e8-20b9-4312-991c-1a4d91b284bd-wal However, in every failed run this two line are missing . Any idea why this would occur? Last but not least: I have setup the log level to 20, however, it seems that the bluestore crash before even getting to the point where things are logged. Regards Benoit On Mon, 6 Aug 2018 at 13:07, Benoit Hudzia <ben...@stratoscale.com> wrote: > Thanks, I ll try to check if i can reproduce it. It's really sporadic and > occurs every 20-30 runs , I might check if it always occurs on the same > server , maybe an HW issue. > > On Mon, 6 Aug 2018 at 06:12, Gregory Farnum <gfar...@redhat.com> wrote: > >> This isn't very complete as it just indicates that something went wrong >> with a read. Since I presume it happens on every startup, it may help if >> you set "debug bluestore = 20" in the OSD's config and provide that log >> (perhaps with ceph-post-file if it's large). >> I also went through my email and see >> https://tracker.ceph.com/issues/24639, if you have anything in common >> with that deployment. (But you probably don't; an error on read generally >> is about bad state on disk that was created somewhere else.) >> -Greg >> >> On Sun, Aug 5, 2018 at 3:19 PM Benoit Hudzia <ben...@stratoscale.com> >> wrote: >> >>> Hi, >>> >>> We start to see core dump occurring with luminous 12.2.7. Any idea where >>> this is coming from ?? We started having issues with bluestore core dumping >>> when we moved to 12.2.6 and hoped that 12.2.7 would have fixed it. We might >>> need to revert back to 12.2.5 as it seems a lot more stable. >>> >>> Pastebin link for full log: https://pastebin.com/na4E3m3N >>> >>> >>> Core dump : >>> >>> starting osd.7 at - osd_data /var/lib/ceph/osd/ceph-7 >>> /var/lib/ceph/osd/ceph-7/journal >>> *** Caught signal (Segmentation fault) ** >>> in thread 7fa8830cfd80 thread_name:ceph-osd >>> ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous >>> (stable) >>> 1: (()+0xa48ec1) [0x55e010afcec1] >>> 2: (()+0xf6d0) [0x7fa8807966d0] >>> 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned >>> long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72] >>> 4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f] >>> 5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4] >>> 6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7] >>> 7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e] >>> 8: (OSD::init()+0x3bd) [0x55e0105c934d] >>> 9: (main()+0x2d07) [0x55e0104ce947] >>> 10: (__libc_start_main()+0xf5) [0x7fa87f7a3445] >>> 11: (()+0x4b9003) [0x55e01056d003] >>> 2018-08-03 21:58:12.248736 7fa8830cfd80 -1 *** Caught signal (Segmentation >>> fault) ** >>> in thread 7fa8830cfd80 thread_name:ceph-osd >>> >>> ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous >>> (stable) >>> 1: (()+0xa48ec1) [0x55e010afcec1] >>> 2: (()+0xf6d0) [0x7fa8807966d0] >>> 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned >>> long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72] >>> 4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f] >>> 5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4] >>> 6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7] >>> 7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e] >>> 8: (OSD::init()+0x3bd) [0x55e0105c934d] >>> 9: (main()+0x2d07) [0x55e0104ce947] >>> 10: (__libc_start_main()+0xf5) [0x7fa87f7a3445] >>> 11: (()+0x4b9003) [0x55e01056d003] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >>> to interpret this. >>> >>> 0> 2018-08-03 21:58:12.248736 7fa8830cfd80 -1 *** Caught signal >>> (Segmentation fault) ** >>> in thread 7fa8830cfd80 thread_name:ceph-osd >>> >>> ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous >>> (stable) >>> 1: (()+0xa48ec1) [0x55e010afcec1] >>> 2: (()+0xf6d0) [0x7fa8807966d0] >>> 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned >>> long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72] >>> 4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f] >>> 5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4] >>> 6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7] >>> 7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e] >>> 8: (OSD::init()+0x3bd) [0x55e0105c934d] >>> 9: (main()+0x2d07) [0x55e0104ce947] >>> 10: (__libc_start_main()+0xf5) [0x7fa87f7a3445] >>> 11: (()+0x4b9003) [0x55e01056d003] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >>> to interpret this. >>> >>> /osd_entrypoint: line 98: 119388 Segmentation fault (core dumped) >>> /usr/bin/ceph-osd -f --cluster "${CEPH_CLUSTERNAME}" --id "${OSD_ID}" >>> --setuser root --setgroup root >>> >>> >>> >>> >>> -- >>> Dr. Benoit Hudzia >>> >>> Mobile (UK): +44 (0) 75 346 78673 >>> Mobile (IE): +353 (0) 89 219 3675 <+353%2089%20219%203675> >>> Email: ben...@stratoscale.com >>> >>> >>> >>> Web <http://www.stratoscale.com/> | Blog >>> <http://www.stratoscale.com/blog/> | Twitter >>> <https://twitter.com/Stratoscale> | Google+ >>> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts> >>> | Linkedin <https://www.linkedin.com/company/stratoscale> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > > -- > Dr. Benoit Hudzia > > Mobile (UK): +44 (0) 75 346 78673 > Mobile (IE): +353 (0) 89 219 3675 > Email: ben...@stratoscale.com > > > > Web <http://www.stratoscale.com/> | Blog > <http://www.stratoscale.com/blog/> | Twitter > <https://twitter.com/Stratoscale> | Google+ > <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts> > | Linkedin <https://www.linkedin.com/company/stratoscale> > > -- Dr. Benoit Hudzia Mobile (UK): +44 (0) 75 346 78673 Mobile (IE): +353 (0) 89 219 3675 Email: ben...@stratoscale.com Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/> | Twitter <https://twitter.com/Stratoscale> | Google+ <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts> | Linkedin <https://www.linkedin.com/company/stratoscale>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com