AH!  Sorry for the false alarm, I clearly have a hard drive problem....

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata2.00: BMDMA stat 0x24

ata2.00: failed command: READ DMA

ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in

         res 51/40:00:3f:bd:70/40:00:21:00:00/ef Emask 0x9 (media error)

ata2.00: status: { DRDY ERR }

ata2.00: error: { UNC }

ata2.00: configured for UDMA/133

ata2: EH complete

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata2.00: BMDMA stat 0x24

ata2.00: failed command: READ DMA

ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in

         res 51/40:00:3f:bd:70/40:00:21:00:00/ef Emask 0x9 (media error)

ata2.00: status: { DRDY ERR }

ata2.00: error: { UNC }

ata2.00: configured for UDMA/133

ata2: EH complete

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata2.00: BMDMA stat 0x24

ata2.00: failed command: READ DMA

ata2.00: cmd c8/00:08:38:bd:70/00:00:00:00:00/ef tag 0 dma 4096 in

         res 51/40:00:3f:bd:70/40:00:21:00:00/ef Emask 0x9 (media error)

ata2.00: status: { DRDY ERR }

ata2.00: error: { UNC }

ata2.00: configured for UDMA/133

ata2: EH complete

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

ata2.00: BMDMA stat 0x24

ata2.00: failed command: READ DMA

On Thu, Nov 13, 2014 at 9:28 PM, Sage Weil <s...@newdream.net> wrote:

> Hmm, looks like leveldb is hitting a problem.  Is there anything in the
> kernel log (dmesg) that suggests a disk or file system problem?  Are you
> able to, say, tar up the current/omap directory without problems?
>
> This is a single OSD, right?  None of the others have been upgraded yet?
>
> sage
>
>
> On Thu, 13 Nov 2014, Joshua McClintock wrote:
>
> >
> > [root@ceph-node20 ~]# ls /var/lib/ceph/osd/us-west01-0/current
> >
> > 0.10_head  0.1a_head  0.23_head  0.2c_head  0.37_head  0.3_head
> 0.b_head
> > 1.10_head  1.1b_head  1.24_head  1.2c_head  1.3a_head  1.b_head
> 2.16_head
> > 2.1_head   2.2a_head  2.32_head  2.3a_head  2.a_head       omap
> >
> > 0.11_head  0.1d_head  0.25_head  0.2e_head  0.38_head  0.4_head
> 0.c_head
> > 1.13_head  1.1d_head  1.26_head  1.2f_head  1.3b_head  1.e_head
> 2.1a_head
> > 2.22_head  2.2c_head  2.33_head  2.3e_head  2.b_head
> >
> > 0.13_head  0.1f_head  0.26_head  0.2f_head  0.3b_head  0.5_head
> 0.d_head
> > 1.16_head  1.1f_head  1.27_head  1.31_head  1.3e_head  2.0_head
> 2.1b_head
> > 2.25_head  2.2e_head  2.36_head  2.3f_head  2.c_head
> >
> > 0.16_head  0.20_head  0.27_head  0.30_head  0.3c_head  0.6_head
> 0.e_head
> > 1.18_head  1.20_head  1.29_head  1.36_head  1.3_head   2.10_head
> 2.1c_head
> > 2.26_head  2.2f_head  2.37_head  2.4_head   commit_op_seq
> >
> > 0.18_head  0.21_head  0.28_head  0.33_head  0.3e_head  0.7_head
> 0.f_head
> > 1.19_head  1.22_head  1.2a_head  1.37_head  1.4_head   2.11_head
> 2.1d_head
> > 2.27_head  2.30_head  2.38_head  2.7_head   meta
> >
> > 0.19_head  0.22_head  0.29_head  0.35_head  0.3f_head  0.9_head
> 1.0_head
> > 1.1a_head  1.23_head  1.2b_head  1.39_head  1.a_head   2.12_head
> 2.1e_head
> > 2.28_head  2.31_head  2.39_head  2.8_head   nosnap
> >
> >
> > The output from the other command was too long to post, here's the link
> to
> > the full dump:
> >
> > http://pastee.co/Kd1BlP
> >
> > Here's the last 100-200 lines:
> >
> > ...
> >
> > ...
> >
> > ...
> >
> > ...
> >
> > _HOBJTOSEQ_:pglog%u2%e2e...0.none.516B9E4C
> >
> > _HOBJTOSEQ_:pglog%u2%e2f...0.none.516B9F1C
> >
> > _HOBJTOSEQ_:pglog%u2%e30...0.none.516BFD4B
> >
> > _HOBJTOSEQ_:pglog%u2%e31...0.none.516BF21B
> >
> > _HOBJTOSEQ_:pglog%u2%e32...0.none.516BF3AB
> >
> > _HOBJTOSEQ_:pglog%u2%e33...0.none.516BF37B
> >
> > _HOBJTOSEQ_:pglog%u2%e36...0.none.516BF16B
> >
> > _HOBJTOSEQ_:pglog%u2%e37...0.none.516BF63B
> >
> > _HOBJTOSEQ_:pglog%u2%e38...0.none.516BF7CB
> >
> > _HOBJTOSEQ_:pglog%u2%e39...0.none.516BF49B
> >
> > _HOBJTOSEQ_:pglog%u2%e3a...0.none.516B933C
> >
> > _HOBJTOSEQ_:pglog%u2%e3e...0.none.516B96FC
> >
> > _HOBJTOSEQ_:pglog%u2%e3f...0.none.516B978C
> >
> > _HOBJTOSEQ_:pglog%u2%e4...0.none.103ABD8E
> >
> > _HOBJTOSEQ_:pglog%u2%e7...0.none.103AB3BE
> >
> > _HOBJTOSEQ_:pglog%u2%e8...0.none.103AB34E
> >
> > _HOBJTOSEQ_:pglog%u2%ea...0.none.103A5CBF
> >
> > _HOBJTOSEQ_:pglog%u2%eb...0.none.103A5C4F
> >
> > _HOBJTOSEQ_:pglog%u2%ec...0.none.103A5D1F
> >
> > _SYS_:HEADER
> >
> > *** Caught signal (Bus error) **
> >
> >  in thread 7f64e92ce760
> >
> >  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> >
> >  1: ceph-kvstore-tool() [0x4bf2e1]
> >
> >  2: (()+0xf710) [0x7f64e86e0710]
> >
> >  3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions
> > const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x1cb)
> > [0x7f64e8e9f73b]
> >
> >  4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&,
> > leveldb::Slice const&)+0x291) [0x7f64e8ea0de1]
> >
> >  5: (()+0x3a412) [0x7f64e8ea3412]
> >
> >  6: (()+0x3a6f8) [0x7f64e8ea36f8]
> >
> >  7: (()+0x3a78d) [0x7f64e8ea378d]
> >
> >  8: (()+0x3761a) [0x7f64e8ea061a]
> >
> >  9: (()+0x20fd2) [0x7f64e8e89fd2]
> >
> >  10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47)
> [0x4ba417]
> >
> >  11: (StoreTool::traverse(std::string const&, bool, std::ostream*)+0x1da)
> > [0x4b65fa]
> >
> >  12: (main()+0x2cc) [0x4b26fc]
> >
> >  13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d]
> >
> >  14: ceph-kvstore-tool() [0x4b21b9]
> >
> > 2014-11-13 21:19:18.318941 7f64e92ce760 -1 *** Caught signal (Bus error)
> **
> >
> >  in thread 7f64e92ce760
> >
> >
> >  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> >
> >  1: ceph-kvstore-tool() [0x4bf2e1]
> >
> >  2: (()+0xf710) [0x7f64e86e0710]
> >
> >  3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions
> > const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x1cb)
> > [0x7f64e8e9f73b]
> >
> >  4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&,
> > leveldb::Slice const&)+0x291) [0x7f64e8ea0de1]
> >
> >  5: (()+0x3a412) [0x7f64e8ea3412]
> >
> >  6: (()+0x3a6f8) [0x7f64e8ea36f8]
> >
> >  7: (()+0x3a78d) [0x7f64e8ea378d]
> >
> >  8: (()+0x3761a) [0x7f64e8ea061a]
> >
> >  9: (()+0x20fd2) [0x7f64e8e89fd2]
> >
> >  10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47)
> [0x4ba417]
> >
> >  11: (StoreTool::traverse(std::string const&, bool, std::ostream*)+0x1da)
> > [0x4b65fa]
> >
> >  12: (main()+0x2cc) [0x4b26fc]
> >
> >  13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d]
> >
> >  14: ceph-kvstore-tool() [0x4b21b9]
> >
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to
> > interpret this.
> >
> >
> > --- begin dump of recent events ---
> >
> >    -13> 2014-11-13 21:19:04.689722 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command perfcounters_dump hook 0x1f1b510
> >
> >    -12> 2014-11-13 21:19:04.689754 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command 1 hook 0x1f1b510
> >
> >    -11> 2014-11-13 21:19:04.689771 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command perf dump hook 0x1f1b510
> >
> >    -10> 2014-11-13 21:19:04.689778 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command perfcounters_schema hook 0x1f1b510
> >
> >     -9> 2014-11-13 21:19:04.689787 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command 2 hook 0x1f1b510
> >
> >     -8> 2014-11-13 21:19:04.689793 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command perf schema hook 0x1f1b510
> >
> >     -7> 2014-11-13 21:19:04.689803 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command config show hook 0x1f1b510
> >
> >     -6> 2014-11-13 21:19:04.689811 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command config set hook 0x1f1b510
> >
> >     -5> 2014-11-13 21:19:04.689818 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command config get hook 0x1f1b510
> >
> >     -4> 2014-11-13 21:19:04.689821 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command log flush hook 0x1f1b510
> >
> >     -3> 2014-11-13 21:19:04.689831 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command log dump hook 0x1f1b510
> >
> >     -2> 2014-11-13 21:19:04.689837 7f64e92ce760  5 asok(0x1f1b5b0)
> > register_command log reopen hook 0x1f1b510
> >
> >     -1> 2014-11-13 21:19:04.689940 7f64e92ce760 -1 did not load config
> file,
> > using default settings.
> >
> >      0> 2014-11-13 21:19:18.318941 7f64e92ce760 -1 *** Caught signal (Bus
> > error) **
> >
> >  in thread 7f64e92ce760
> >
> >
> >  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> >
> >  1: ceph-kvstore-tool() [0x4bf2e1]
> >
> >  2: (()+0xf710) [0x7f64e86e0710]
> >
> >  3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions
> > const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x1cb)
> > [0x7f64e8e9f73b]
> >
> >  4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&,
> > leveldb::Slice const&)+0x291) [0x7f64e8ea0de1]
> >
> >  5: (()+0x3a412) [0x7f64e8ea3412]
> >
> >  6: (()+0x3a6f8) [0x7f64e8ea36f8]
> >
> >  7: (()+0x3a78d) [0x7f64e8ea378d]
> >
> >  8: (()+0x3761a) [0x7f64e8ea061a]
> >
> >  9: (()+0x20fd2) [0x7f64e8e89fd2]
> >
> >  10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47)
> [0x4ba417]
> >
> >  11: (StoreTool::traverse(std::string const&, bool, std::ostream*)+0x1da)
> > [0x4b65fa]
> >
> >  12: (main()+0x2cc) [0x4b26fc]
> >
> >  13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d]
> >
> >  14: ceph-kvstore-tool() [0x4b21b9]
> >
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to
> > interpret this.
> >
> >
> > --- logging levels ---
> >
> >    0/ 5 none
> >
> >    0/ 1 lockdep
> >
> >    0/ 1 context
> >
> >    1/ 1 crush
> >
> >    1/ 5 mds
> >
> >    1/ 5 mds_balancer
> >
> >    1/ 5 mds_locker
> >
> >    1/ 5 mds_log
> >
> >    1/ 5 mds_log_expire
> >
> >    1/ 5 mds_migrator
> >
> >    0/ 1 buffer
> >
> >    0/ 1 timer
> >
> >    0/ 1 filer
> >
> >    0/ 1 striper
> >
> >    0/ 1 objecter
> >
> >    0/ 5 rados
> >
> >    0/ 5 rbd
> >
> >    0/ 5 journaler
> >
> >    0/ 5 objectcacher
> >
> >    0/ 5 client
> >
> >    0/ 5 osd
> >
> >    0/ 5 optracker
> >
> >    0/ 5 objclass
> >
> >    1/ 3 filestore
> >
> >    1/ 3 keyvaluestore
> >
> >    1/ 3 journal
> >
> >    0/ 5 ms
> >
> >    1/ 5 mon
> >
> >    0/10 monc
> >
> >    1/ 5 paxos
> >
> >    0/ 5 tp
> >
> >    1/ 5 auth
> >
> >    1/ 5 crypto
> >
> >    1/ 1 finisher
> >
> >    1/ 5 heartbeatmap
> >
> >    1/ 5 perfcounter
> >
> >    1/ 5 rgw
> >
> >    1/ 5 javaclient
> >
> >    1/ 5 asok
> >
> >    1/ 1 throttle
> >
> >   -2/-2 (syslog threshold)
> >
> >   99/99 (stderr threshold)
> >
> >   max_recent       500
> >
> >   max_new         1000
> >
> >   log_file
> >
> > --- end dump of recent events ---
> >
> > Bus error
> >
> >
> >
> > Joshua
> >
> >
> >
> > On Thu, Nov 13, 2014 at 8:52 PM, Sage Weil <s...@newdream.net> wrote:
> >       On Thu, 13 Nov 2014, Joshua McClintock wrote:
> >       > I upgraded my mons to the latest version and they appear to
> >       work, I then
> >       > upgraded my mds and it seems fine.
> >       > I then upgraded one OSD node and the OSD fails to start with
> >       the following
> >       > dump, any help is appreciated:
> >       >
> >       > --- begin dump of recent events ---
> >       >
> >       >      0> 2014-11-13 18:20:15.625793 7fbd973ce7a0 -1 *** Caught
> >       signal
> >       > (Aborted) **
> >       >
> >       >  in thread 7fbd973ce7a0
> >       >
> >       >
> >       >  ceph version 0.80.7
> >       (6c0127fcb58008793d3c8b62d925bc91963672a3)
> >       >  1: /usr/bin/ceph-osd() [0x9bd2a1]
> >       >  2: (()+0xf710) [0x7fbd96373710]
> >       >  3: (gsignal()+0x35) [0x7fbd95245925]
> >       >  4: (abort()+0x175) [0x7fbd95247105]
> >       >  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d)
> >       [0x7fbd95affa5d]
> >       >  6: (()+0xbcbe6) [0x7fbd95afdbe6]
> >       >  7: (()+0xbcc13) [0x7fbd95afdc13]
> >       >  8: (()+0xbcd0e) [0x7fbd95afdd0e]
> >       >  9: (ceph::__ceph_assert_fail(char const*, char const*, int,
> >       char
> >       > const*)+0x7f2) [0xafbe22]
> >       >  10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
> >       > ceph::buffer::list*)+0x4ea) [0x7f729a]
> >       >  11: (OSD::load_pgs()+0x18f1) [0x64f2b1]
> >       >  12: (OSD::init()+0x22c0) [0x6536f0]
> >       >  13: (main()+0x35bc) [0x5fe39c]
> >       >  14: (__libc_start_main()+0xfd) [0x7fbd95231d1d]
> >       >  15: /usr/bin/ceph-osd() [0x5f9e49]
> >       >  NOTE: a copy of the executable, or `objdump -rdS
> >       <executable>` is needed to
> >       > interpret this.
> >
> > Hey, this looks like a different report we saw recently off-list!  In
> > that
> > case, they were upgrading from 0.80.4 to 0.80.7.  Opening #10105.
> >
> > You only restarting a single OSD?  I would hold off on restarting any
> > more
> > for the time being.
> >
> > Can you attach the output from
> >
> >         ls /var/lib/ceph/osd/ceph-NNN/current
> > and
> >         ceph-kvstore-tool /var/lib/ceph/osd/ceph-NNN/current/omap list
> >
> > (you may need to install the ceph-tests rpm to get ceph-kvstore-tool).
> >
> > Thanks!
> > sage
> >
> >
> > >
> > >
> > > --- logging levels ---
> > >
> > >    0/ 5 none
> > >
> > >    0/ 1 lockdep
> > >
> > >    0/ 1 context
> > >
> > >    1/ 1 crush
> > >
> > >    1/ 5 mds
> > >
> > >    1/ 5 mds_balancer
> > >
> > >    1/ 5 mds_locker
> > >
> > >    1/ 5 mds_log
> > >
> > >    1/ 5 mds_log_expire
> > >
> > >    1/ 5 mds_migrator
> > >
> > >    0/ 1 buffer
> > >
> > >    0/ 1 timer
> > >
> > >    0/ 1 filer
> > >
> > >    0/ 1 striper
> > >
> > >    0/ 1 objecter
> > >
> > >    0/ 5 rados
> > >
> > >    0/ 5 rbd
> > >
> > >    0/ 5 journaler
> > >
> > >    0/ 5 objectcacher
> > >
> > >    0/ 5 client
> > >
> > >    0/ 5 osd
> > >
> > >    0/ 5 optracker
> > >
> > >    0/ 5 objclass
> > >
> > >    1/ 3 filestore
> > >
> > >    1/ 3 keyvaluestore
> > >
> > >    1/ 3 journal
> > >
> > >    0/ 5 ms
> > >
> > >    1/ 5 mon
> > >
> > >    0/10 monc
> > >
> > >    1/ 5 paxos
> > >
> > >    0/ 5 tp
> > >
> > >    1/ 5 auth
> > >
> > >    1/ 5 crypto
> > >
> > >    1/ 1 finisher
> > >
> > >    1/ 5 heartbeatmap
> > >
> > >    1/ 5 perfcounter
> > >
> > >    1/ 5 rgw
> > >
> > >    1/ 5 javaclient
> > >
> > >    1/ 5 asok
> > >
> > >    1/ 1 throttle
> > >
> > >   -2/-2 (syslog threshold)
> > >
> > >   -1/-1 (stderr threshold)
> > >
> > >   max_recent     10000
> > >
> > >   max_new         1000
> > >
> > >   log_file /var/log/ceph/us-west01-osd.0.log
> > >
> > > --- end dump of recent events ---
> > >
> > >
> > >
> >
> >
> >
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to