AH! Sorry for the false alarm, I clearly have a hard drive problem.... ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x24 ata2.00: failed command: READ DMA ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in res 51/40:00:3f:bd:70/40:00:21:00:00/ef Emask 0x9 (media error) ata2.00: status: { DRDY ERR } ata2.00: error: { UNC } ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: BMDMA stat 0x24 ata2.00: failed command: READ DMA ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in res 51/40:00:3f:bd:70/40:00:21:00:00/ef Emask 0x9 (media error) ata2.00: status: { DRDY ERR } ata2.00: error: { UNC } ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: BMDMA stat 0x24 ata2.00: failed command: READ DMA ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in res 51/40:00:3f:bd:70/40:00:21:00:00/ef Emask 0x9 (media error) ata2.00: status: { DRDY ERR } ata2.00: error: { UNC } ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: BMDMA stat 0x24 ata2.00: failed command: READ DMA On Thu, Nov 13, 2014 at 9:28 PM, Sage Weil <s...@newdream.net> wrote: > Hmm, looks like leveldb is hitting a problem. Is there anything in the > kernel log (dmesg) that suggests a disk or file system problem? Are you > able to, say, tar up the current/omap directory without problems? > > This is a single OSD, right? None of the others have been upgraded yet? > > sage > > > On Thu, 13 Nov 2014, Joshua McClintock wrote: > > > > > [root@ceph-node20 ~]# ls /var/lib/ceph/osd/us-west01-0/current > > > > 0.10_head 0.1a_head 0.23_head 0.2c_head 0.37_head 0.3_head > 0.b_head > > 1.10_head 1.1b_head 1.24_head 1.2c_head 1.3a_head 1.b_head > 2.16_head > > 2.1_head 2.2a_head 2.32_head 2.3a_head 2.a_head omap > > > > 0.11_head 0.1d_head 0.25_head 0.2e_head 0.38_head 0.4_head > 0.c_head > > 1.13_head 1.1d_head 1.26_head 1.2f_head 1.3b_head 1.e_head > 2.1a_head > > 2.22_head 2.2c_head 2.33_head 2.3e_head 2.b_head > > > > 0.13_head 0.1f_head 0.26_head 0.2f_head 0.3b_head 0.5_head > 0.d_head > > 1.16_head 1.1f_head 1.27_head 1.31_head 1.3e_head 2.0_head > 2.1b_head > > 2.25_head 2.2e_head 2.36_head 2.3f_head 2.c_head > > > > 0.16_head 0.20_head 0.27_head 0.30_head 0.3c_head 0.6_head > 0.e_head > > 1.18_head 1.20_head 1.29_head 1.36_head 1.3_head 2.10_head > 2.1c_head > > 2.26_head 2.2f_head 2.37_head 2.4_head commit_op_seq > > > > 0.18_head 0.21_head 0.28_head 0.33_head 0.3e_head 0.7_head > 0.f_head > > 1.19_head 1.22_head 1.2a_head 1.37_head 1.4_head 2.11_head > 2.1d_head > > 2.27_head 2.30_head 2.38_head 2.7_head meta > > > > 0.19_head 0.22_head 0.29_head 0.35_head 0.3f_head 0.9_head > 1.0_head > > 1.1a_head 1.23_head 1.2b_head 1.39_head 1.a_head 2.12_head > 2.1e_head > > 2.28_head 2.31_head 2.39_head 2.8_head nosnap > > > > > > The output from the other command was too long to post, here's the link > to > > the full dump: > > > > http://pastee.co/Kd1BlP > > > > Here's the last 100-200 lines: > > > > ... > > > > ... > > > > ... > > > > ... > > > > _HOBJTOSEQ_:pglog%u2%e2e...0.none.516B9E4C > > > > _HOBJTOSEQ_:pglog%u2%e2f...0.none.516B9F1C > > > > _HOBJTOSEQ_:pglog%u2%e30...0.none.516BFD4B > > > > _HOBJTOSEQ_:pglog%u2%e31...0.none.516BF21B > > > > _HOBJTOSEQ_:pglog%u2%e32...0.none.516BF3AB > > > > _HOBJTOSEQ_:pglog%u2%e33...0.none.516BF37B > > > > _HOBJTOSEQ_:pglog%u2%e36...0.none.516BF16B > > > > _HOBJTOSEQ_:pglog%u2%e37...0.none.516BF63B > > > > _HOBJTOSEQ_:pglog%u2%e38...0.none.516BF7CB > > > > _HOBJTOSEQ_:pglog%u2%e39...0.none.516BF49B > > > > _HOBJTOSEQ_:pglog%u2%e3a...0.none.516B933C > > > > _HOBJTOSEQ_:pglog%u2%e3e...0.none.516B96FC > > > > _HOBJTOSEQ_:pglog%u2%e3f...0.none.516B978C > > > > _HOBJTOSEQ_:pglog%u2%e4...0.none.103ABD8E > > > > _HOBJTOSEQ_:pglog%u2%e7...0.none.103AB3BE > > > > _HOBJTOSEQ_:pglog%u2%e8...0.none.103AB34E > > > > _HOBJTOSEQ_:pglog%u2%ea...0.none.103A5CBF > > > > _HOBJTOSEQ_:pglog%u2%eb...0.none.103A5C4F > > > > _HOBJTOSEQ_:pglog%u2%ec...0.none.103A5D1F > > > > _SYS_:HEADER > > > > *** Caught signal (Bus error) ** > > > > in thread 7f64e92ce760 > > > > ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) > > > > 1: ceph-kvstore-tool() [0x4bf2e1] > > > > 2: (()+0xf710) [0x7f64e86e0710] > > > > 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions > > const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x1cb) > > [0x7f64e8e9f73b] > > > > 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, > > leveldb::Slice const&)+0x291) [0x7f64e8ea0de1] > > > > 5: (()+0x3a412) [0x7f64e8ea3412] > > > > 6: (()+0x3a6f8) [0x7f64e8ea36f8] > > > > 7: (()+0x3a78d) [0x7f64e8ea378d] > > > > 8: (()+0x3761a) [0x7f64e8ea061a] > > > > 9: (()+0x20fd2) [0x7f64e8e89fd2] > > > > 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) > [0x4ba417] > > > > 11: (StoreTool::traverse(std::string const&, bool, std::ostream*)+0x1da) > > [0x4b65fa] > > > > 12: (main()+0x2cc) [0x4b26fc] > > > > 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d] > > > > 14: ceph-kvstore-tool() [0x4b21b9] > > > > 2014-11-13 21:19:18.318941 7f64e92ce760 -1 *** Caught signal (Bus error) > ** > > > > in thread 7f64e92ce760 > > > > > > ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) > > > > 1: ceph-kvstore-tool() [0x4bf2e1] > > > > 2: (()+0xf710) [0x7f64e86e0710] > > > > 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions > > const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x1cb) > > [0x7f64e8e9f73b] > > > > 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, > > leveldb::Slice const&)+0x291) [0x7f64e8ea0de1] > > > > 5: (()+0x3a412) [0x7f64e8ea3412] > > > > 6: (()+0x3a6f8) [0x7f64e8ea36f8] > > > > 7: (()+0x3a78d) [0x7f64e8ea378d] > > > > 8: (()+0x3761a) [0x7f64e8ea061a] > > > > 9: (()+0x20fd2) [0x7f64e8e89fd2] > > > > 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) > [0x4ba417] > > > > 11: (StoreTool::traverse(std::string const&, bool, std::ostream*)+0x1da) > > [0x4b65fa] > > > > 12: (main()+0x2cc) [0x4b26fc] > > > > 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d] > > > > 14: ceph-kvstore-tool() [0x4b21b9] > > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to > > interpret this. > > > > > > --- begin dump of recent events --- > > > > -13> 2014-11-13 21:19:04.689722 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command perfcounters_dump hook 0x1f1b510 > > > > -12> 2014-11-13 21:19:04.689754 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command 1 hook 0x1f1b510 > > > > -11> 2014-11-13 21:19:04.689771 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command perf dump hook 0x1f1b510 > > > > -10> 2014-11-13 21:19:04.689778 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command perfcounters_schema hook 0x1f1b510 > > > > -9> 2014-11-13 21:19:04.689787 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command 2 hook 0x1f1b510 > > > > -8> 2014-11-13 21:19:04.689793 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command perf schema hook 0x1f1b510 > > > > -7> 2014-11-13 21:19:04.689803 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command config show hook 0x1f1b510 > > > > -6> 2014-11-13 21:19:04.689811 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command config set hook 0x1f1b510 > > > > -5> 2014-11-13 21:19:04.689818 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command config get hook 0x1f1b510 > > > > -4> 2014-11-13 21:19:04.689821 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command log flush hook 0x1f1b510 > > > > -3> 2014-11-13 21:19:04.689831 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command log dump hook 0x1f1b510 > > > > -2> 2014-11-13 21:19:04.689837 7f64e92ce760 5 asok(0x1f1b5b0) > > register_command log reopen hook 0x1f1b510 > > > > -1> 2014-11-13 21:19:04.689940 7f64e92ce760 -1 did not load config > file, > > using default settings. > > > > 0> 2014-11-13 21:19:18.318941 7f64e92ce760 -1 *** Caught signal (Bus > > error) ** > > > > in thread 7f64e92ce760 > > > > > > ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) > > > > 1: ceph-kvstore-tool() [0x4bf2e1] > > > > 2: (()+0xf710) [0x7f64e86e0710] > > > > 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions > > const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x1cb) > > [0x7f64e8e9f73b] > > > > 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, > > leveldb::Slice const&)+0x291) [0x7f64e8ea0de1] > > > > 5: (()+0x3a412) [0x7f64e8ea3412] > > > > 6: (()+0x3a6f8) [0x7f64e8ea36f8] > > > > 7: (()+0x3a78d) [0x7f64e8ea378d] > > > > 8: (()+0x3761a) [0x7f64e8ea061a] > > > > 9: (()+0x20fd2) [0x7f64e8e89fd2] > > > > 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) > [0x4ba417] > > > > 11: (StoreTool::traverse(std::string const&, bool, std::ostream*)+0x1da) > > [0x4b65fa] > > > > 12: (main()+0x2cc) [0x4b26fc] > > > > 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d] > > > > 14: ceph-kvstore-tool() [0x4b21b9] > > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to > > interpret this. > > > > > > --- logging levels --- > > > > 0/ 5 none > > > > 0/ 1 lockdep > > > > 0/ 1 context > > > > 1/ 1 crush > > > > 1/ 5 mds > > > > 1/ 5 mds_balancer > > > > 1/ 5 mds_locker > > > > 1/ 5 mds_log > > > > 1/ 5 mds_log_expire > > > > 1/ 5 mds_migrator > > > > 0/ 1 buffer > > > > 0/ 1 timer > > > > 0/ 1 filer > > > > 0/ 1 striper > > > > 0/ 1 objecter > > > > 0/ 5 rados > > > > 0/ 5 rbd > > > > 0/ 5 journaler > > > > 0/ 5 objectcacher > > > > 0/ 5 client > > > > 0/ 5 osd > > > > 0/ 5 optracker > > > > 0/ 5 objclass > > > > 1/ 3 filestore > > > > 1/ 3 keyvaluestore > > > > 1/ 3 journal > > > > 0/ 5 ms > > > > 1/ 5 mon > > > > 0/10 monc > > > > 1/ 5 paxos > > > > 0/ 5 tp > > > > 1/ 5 auth > > > > 1/ 5 crypto > > > > 1/ 1 finisher > > > > 1/ 5 heartbeatmap > > > > 1/ 5 perfcounter > > > > 1/ 5 rgw > > > > 1/ 5 javaclient > > > > 1/ 5 asok > > > > 1/ 1 throttle > > > > -2/-2 (syslog threshold) > > > > 99/99 (stderr threshold) > > > > max_recent 500 > > > > max_new 1000 > > > > log_file > > > > --- end dump of recent events --- > > > > Bus error > > > > > > > > Joshua > > > > > > > > On Thu, Nov 13, 2014 at 8:52 PM, Sage Weil <s...@newdream.net> wrote: > > On Thu, 13 Nov 2014, Joshua McClintock wrote: > > > I upgraded my mons to the latest version and they appear to > > work, I then > > > upgraded my mds and it seems fine. > > > I then upgraded one OSD node and the OSD fails to start with > > the following > > > dump, any help is appreciated: > > > > > > --- begin dump of recent events --- > > > > > > 0> 2014-11-13 18:20:15.625793 7fbd973ce7a0 -1 *** Caught > > signal > > > (Aborted) ** > > > > > > in thread 7fbd973ce7a0 > > > > > > > > > ceph version 0.80.7 > > (6c0127fcb58008793d3c8b62d925bc91963672a3) > > > 1: /usr/bin/ceph-osd() [0x9bd2a1] > > > 2: (()+0xf710) [0x7fbd96373710] > > > 3: (gsignal()+0x35) [0x7fbd95245925] > > > 4: (abort()+0x175) [0x7fbd95247105] > > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) > > [0x7fbd95affa5d] > > > 6: (()+0xbcbe6) [0x7fbd95afdbe6] > > > 7: (()+0xbcc13) [0x7fbd95afdc13] > > > 8: (()+0xbcd0e) [0x7fbd95afdd0e] > > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, > > char > > > const*)+0x7f2) [0xafbe22] > > > 10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, > > > ceph::buffer::list*)+0x4ea) [0x7f729a] > > > 11: (OSD::load_pgs()+0x18f1) [0x64f2b1] > > > 12: (OSD::init()+0x22c0) [0x6536f0] > > > 13: (main()+0x35bc) [0x5fe39c] > > > 14: (__libc_start_main()+0xfd) [0x7fbd95231d1d] > > > 15: /usr/bin/ceph-osd() [0x5f9e49] > > > NOTE: a copy of the executable, or `objdump -rdS > > <executable>` is needed to > > > interpret this. > > > > Hey, this looks like a different report we saw recently off-list! In > > that > > case, they were upgrading from 0.80.4 to 0.80.7. Opening #10105. > > > > You only restarting a single OSD? I would hold off on restarting any > > more > > for the time being. > > > > Can you attach the output from > > > > ls /var/lib/ceph/osd/ceph-NNN/current > > and > > ceph-kvstore-tool /var/lib/ceph/osd/ceph-NNN/current/omap list > > > > (you may need to install the ceph-tests rpm to get ceph-kvstore-tool). > > > > Thanks! > > sage > > > > > > > > > > > > > --- logging levels --- > > > > > > 0/ 5 none > > > > > > 0/ 1 lockdep > > > > > > 0/ 1 context > > > > > > 1/ 1 crush > > > > > > 1/ 5 mds > > > > > > 1/ 5 mds_balancer > > > > > > 1/ 5 mds_locker > > > > > > 1/ 5 mds_log > > > > > > 1/ 5 mds_log_expire > > > > > > 1/ 5 mds_migrator > > > > > > 0/ 1 buffer > > > > > > 0/ 1 timer > > > > > > 0/ 1 filer > > > > > > 0/ 1 striper > > > > > > 0/ 1 objecter > > > > > > 0/ 5 rados > > > > > > 0/ 5 rbd > > > > > > 0/ 5 journaler > > > > > > 0/ 5 objectcacher > > > > > > 0/ 5 client > > > > > > 0/ 5 osd > > > > > > 0/ 5 optracker > > > > > > 0/ 5 objclass > > > > > > 1/ 3 filestore > > > > > > 1/ 3 keyvaluestore > > > > > > 1/ 3 journal > > > > > > 0/ 5 ms > > > > > > 1/ 5 mon > > > > > > 0/10 monc > > > > > > 1/ 5 paxos > > > > > > 0/ 5 tp > > > > > > 1/ 5 auth > > > > > > 1/ 5 crypto > > > > > > 1/ 1 finisher > > > > > > 1/ 5 heartbeatmap > > > > > > 1/ 5 perfcounter > > > > > > 1/ 5 rgw > > > > > > 1/ 5 javaclient > > > > > > 1/ 5 asok > > > > > > 1/ 1 throttle > > > > > > -2/-2 (syslog threshold) > > > > > > -1/-1 (stderr threshold) > > > > > > max_recent 10000 > > > > > > max_new 1000 > > > > > > log_file /var/log/ceph/us-west01-osd.0.log > > > > > > --- end dump of recent events --- > > > > > > > > > > > > > > > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com