Just my quriosity, what had you come up with this file? Probably it would be help for someone who face similar issue, I guess.
/var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3 Cheers, S On Thu, Mar 3, 2016 at 3:58 PM, Alexander Gubanov <sht...@gmail.com> wrote: > Nothing of this did't happen. After OSDs fell I found this file > /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3. > Location of this file seemed for me is very strange and I just remove it and > then all osds was started up. > > > On Fri, Feb 26, 2016 at 7:03 PM, Alexey Sheplyakov > <asheplya...@mirantis.com> wrote: >> >> Alexander, >> >> > # ceph osd pool get-quota cache >> > quotas for pool 'cache': >> > max objects: N/A >> > max bytes : N/A >> > But I set target_max_bytes: >> > # ceph osd pool set cache target_max_bytes 1000000000000 >> > Can it serve as the reason? >> >> I've been unable to reproduce http://tracker.ceph.com/issues/13098 >> without setting max_bytes. >> Perhaps you hit a different bug. >> >> > Every time 2 of 18 OSDs are crashing. >> >> How have they got to that state? Were some of OSDs full/nearly full? >> Has the cache pool ever reached its target_max_bytes? Anything else >> which might be relevant? >> >> Best regards, >> Alexey >> >> >> On Wed, Feb 24, 2016 at 7:36 PM, Alexander Gubanov <sht...@gmail.com> >> wrote: >> > Hm. It seems that the cache pool qoutas have not been set. At least I'm >> > sure >> > I didn't set them. >> > >> > # ceph osd pool get-quota cache >> > quotas for pool 'cache': >> > max objects: N/A >> > max bytes : N/A >> > >> > Hmm. It seems that the cache pool quota have not been set. At least I'm >> > sure >> > I didn't set it. Maybe it have default setting. >> > >> > # ceph osd pool get-quota cache >> > quotas for pool 'cache': >> > max objects: N/A >> > max bytes : N/A >> > >> > But I set target_max_bytes: >> > >> > # ceph osd pool set cache target_max_bytes 1000000000000 >> > >> > Can it serve as the reason? >> > >> > On Wed, Feb 24, 2016 at 4:08 PM, Alexey Sheplyakov >> > <asheplya...@mirantis.com> wrote: >> >> >> >> Hi, >> >> >> >> > 0> 2016-02-24 04:51:45.884445 7fd994825700 -1 osd/ReplicatedPG.cc: In >> >> > function 'int >> >> > ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, >> >> > ceph::buffer::list::iterator&, OSDOp&, ObjectContextRef&, bool)' >> >> > thread >> >> > 7fd994825700 time 2016-02-24 04:51:45.870995 >> >> osd/ReplicatedPG.cc: 5558: FAILED assert(cursor.data_complete) >> >> > ceph version 0.80.11-8-g95c4287 >> >> > (95c4287b5d24b762bc8538633c5bb2918ecfe4dd) >> >> >> >> This one looks familiar: http://tracker.ceph.com/issues/13098 >> >> >> >> A quick work around is to unset the cache pool quota: >> >> >> >> ceph osd pool set-quota $cache_pool_name max_bytes 0 >> >> ceph osd pool set-quota $cache_pool_name max_objects 0 >> >> >> >> The problem has been properly fixed in infernalis v9.1.0, and >> >> (partially) in hammer (v0.94.6 which will be released soon). >> >> >> >> Best regards, >> >> Alexey >> >> >> >> >> >> On Wed, Feb 24, 2016 at 5:37 AM, Alexander Gubanov <sht...@gmail.com> >> >> wrote: >> >> > Hi, >> >> > >> >> > Every time 2 of 18 OSDs are crashing. I think it's happening when run >> >> > PG >> >> > replication because crashing only 2 OSDs and every time they're are >> >> > the >> >> > same. >> >> > >> >> > 0> 2016-02-24 04:51:45.884445 7fd994825700 -1 osd/ReplicatedPG.cc: In >> >> > function 'int >> >> > ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, >> >> > ceph::buffer::list::iterator&, OSDOp&, ObjectContextRef&, bool)' >> >> > thread >> >> > 7fd994825700 time 2016-02-24 04:51:45.870995 >> >> > osd/ReplicatedPG.cc: 5558: FAILED assert(cursor.data_complete) >> >> > >> >> > ceph version 0.80.11-8-g95c4287 >> >> > (95c4287b5d24b762bc8538633c5bb2918ecfe4dd) >> >> > 1: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, >> >> > ceph::buffer::list::iterator&, OSDOp&, >> >> > std::tr1::shared_ptr<ObjectContext>&, >> >> > bool)+0xffc) [0x7c1f7c] >> >> > 2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, >> >> > std::vector<OSDOp, >> >> > std::allocator<OSDOp> >&)+0x4171) [0x809f21] >> >> > 3: >> >> > (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62) >> >> > [0x814622] >> >> > 4: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) >> >> > [0x815098] >> >> > 5: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3dd4) >> >> > [0x81a3f4] >> >> > 6: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >> >> > ThreadPool::TPHandle&)+0x66d) [0x7b4ecd] >> >> > 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> >> > std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a5) >> >> > [0x600ee5] >> >> > 8: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, >> >> > ThreadPool::TPHandle&)+0x203) [0x61cba3] >> >> > 9: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, >> >> > std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >> >> >>::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c] >> >> > 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0] >> >> > 11: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0] >> >> > 12: (()+0x7dc5) [0x7fd9ad03edc5] >> >> > 13: (clone()+0x6d) [0x7fd9abd2828d] >> >> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> >> > needed to >> >> > interpret this. >> >> > >> >> > --- logging levels --- >> >> > 0/ 5 none >> >> > 0/ 1 lockdep >> >> > 0/ 1 context >> >> > 1/ 1 crush >> >> > 1/ 5 mds >> >> > 1/ 5 mds_balancer >> >> > 1/ 5 mds_locker >> >> > 1/ 5 mds_log >> >> > 1/ 5 mds_log_expire >> >> > 1/ 5 mds_migrator >> >> > 0/ 1 buffer >> >> > 0/ 1 timer >> >> > 0/ 1 filer >> >> > 0/ 1 striper >> >> > 0/ 1 objecter >> >> > 0/ 5 rados >> >> > 0/ 5 rbd >> >> > 0/ 5 journaler >> >> > 0/ 5 objectcacher >> >> > 0/ 5 client >> >> > 0/ 5 osd >> >> > 0/ 5 optracker >> >> > 0/ 5 objclass >> >> > 1/ 3 filestore >> >> > 1/ 3 keyvaluestore >> >> > 1/ 3 journal >> >> > 0/ 5 ms >> >> > 1/ 5 mon >> >> > 0/10 monc >> >> > 1/ 5 paxos >> >> > 0/ 5 tp >> >> > 1/ 5 auth >> >> > 1/ 5 crypto >> >> > 1/ 1 finisher >> >> > 1/ 5 heartbeatmap >> >> > 1/ 5 perfcounter >> >> > 1/ 5 rgw >> >> > 1/10 civetweb >> >> > 1/ 5 javaclient >> >> > 1/ 5 asok >> >> > 1/ 1 throttle >> >> > -2/-2 (syslog threshold) >> >> > -1/-1 (stderr threshold) >> >> > max_recent 10000 >> >> > max_new 1000 >> >> > log_file /var/log/ceph/ceph-osd.3.log >> >> > --- end dump of recent events --- >> >> > 2016-02-24 04:51:45.944447 7fd994825700 -1 *** Caught signal >> >> > (Aborted) >> >> > ** >> >> > in thread 7fd994825700 >> >> > >> >> > ceph version 0.80.11-8-g95c4287 >> >> > (95c4287b5d24b762bc8538633c5bb2918ecfe4dd) >> >> > 1: /usr/bin/ceph-osd() [0x9a24f6] >> >> > 2: (()+0xf100) [0x7fd9ad046100] >> >> > 3: (gsignal()+0x37) [0x7fd9abc675f7] >> >> > 4: (abort()+0x148) [0x7fd9abc68ce8] >> >> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fd9ac56b9d5] >> >> > 6: (()+0x5e946) [0x7fd9ac569946] >> >> > 7: (()+0x5e973) [0x7fd9ac569973] >> >> > 8: (()+0x5eb93) [0x7fd9ac569b93] >> >> > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> >> > const*)+0x1ef) [0xa8d9df] >> >> > 10: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, >> >> > ceph::buffer::list::iterator&, OSDOp&, >> >> > std::tr1::shared_ptr<ObjectContext>&, >> >> > bool)+0xffc) [0x7c1f7c] >> >> > 11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, >> >> > std::vector<OSDOp, >> >> > std::allocator<OSDOp> >&)+0x4171) [0x809f21] >> >> > 12: >> >> > (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62) >> >> > [0x814622] >> >> > 13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) >> >> > [0x815098] >> >> > 14: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3dd4) >> >> > [0x81a3f4] >> >> > 15: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >> >> > ThreadPool::TPHandle&)+0x66d) [0x7b4ecd] >> >> > 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> >> > std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a5) >> >> > [0x600ee5] >> >> > 17: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, >> >> > ThreadPool::TPHandle&)+0x203) [0x61cba3] >> >> > 18: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, >> >> > std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >> >> >>::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c] >> >> > 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0] >> >> > 20: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0] >> >> > 21: (()+0x7dc5) [0x7fd9ad03edc5] >> >> > 22: (clone()+0x6d) [0x7fd9abd2828d] >> >> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> >> > needed to >> >> > interpret this. >> >> > >> >> > --- begin dump of recent events --- >> >> > -5> 2016-02-24 04:51:45.904559 7fd995026700 5 -- op tracker -- , >> >> > seq: >> >> > 19230, time: 2016-02-24 04:51:45.904559, event: started, request: >> >> > osd_op(osd.13.12097:806246 rb.0.218d6.238e1f29.000000010db3@snapdir >> >> > [list-snaps] 3.94c2bed2 >> >> > ack+read+ignore_cache+ignore_overlay+map_snap_clone >> >> > e13252) v4 >> >> > -4> 2016-02-24 04:51:45.904598 7fd995026700 1 -- >> >> > 172.16.0.1:6801/419703 >> >> > --> 172.16.0.3:6844/12260 -- osd_op_reply(806246 >> >> > rb.0.218d6.238e1f29.000000010db3 [list-snaps] v0'0 uv27683057 ondisk >> >> > = >> >> > 0) v6 >> >> > -- ?+0 0x9f90800 con 0x1b7838c0 >> >> > -3> 2016-02-24 04:51:45.904616 7fd995026700 5 -- op tracker -- , >> >> > seq: >> >> > 19230, time: 2016-02-24 04:51:45.904616, event: done, request: >> >> > osd_op(osd.13.12097:806246 rb.0.218d6.238e1f29.000000010db3@snapdir >> >> > [list-snaps] 3.94c2bed2 >> >> > ack+read+ignore_cache+ignore_overlay+map_snap_clone >> >> > e13252) v4 >> >> > -2> 2016-02-24 04:51:45.904637 7fd995026700 5 -- op tracker -- , >> >> > seq: >> >> > 19231, time: 2016-02-24 04:51:45.904637, event: reached_pg, request: >> >> > osd_op(osd.13.12097:806247 rb.0.218d6.238e1f29.000000010db3 [copy-get >> >> > max >> >> > 8388608] 3.94c2bed2 >> >> > ack+read+ignore_cache+ignore_overlay+map_snap_clone >> >> > e13252) v4 >> >> > -1> 2016-02-24 04:51:45.904673 7fd995026700 5 -- op tracker -- , >> >> > seq: >> >> > 19231, time: 2016-02-24 04:51:45.904673, event: started, request: >> >> > osd_op(osd.13.12097:806247 rb.0.218d6.238e1f29.000000010db3 [copy-get >> >> > max >> >> > 8388608] 3.94c2bed2 >> >> > ack+read+ignore_cache+ignore_overlay+map_snap_clone >> >> > e13252) v4 >> >> > 0> 2016-02-24 04:51:45.944447 7fd994825700 -1 *** Caught signal >> >> > (Aborted) ** >> >> > in thread 7fd994825700 >> >> > >> >> > ceph version 0.80.11-8-g95c4287 >> >> > (95c4287b5d24b762bc8538633c5bb2918ecfe4dd) >> >> > 1: /usr/bin/ceph-osd() [0x9a24f6] >> >> > 2: (()+0xf100) [0x7fd9ad046100] >> >> > 3: (gsignal()+0x37) [0x7fd9abc675f7] >> >> > 4: (abort()+0x148) [0x7fd9abc68ce8] >> >> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fd9ac56b9d5] >> >> > 6: (()+0x5e946) [0x7fd9ac569946] >> >> > 7: (()+0x5e973) [0x7fd9ac569973] >> >> > 8: (()+0x5eb93) [0x7fd9ac569b93] >> >> > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> >> > const*)+0x1ef) [0xa8d9df] >> >> > 10: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, >> >> > ceph::buffer::list::iterator&, OSDOp&, >> >> > std::tr1::shared_ptr<ObjectContext>&, >> >> > bool)+0xffc) [0x7c1f7c] >> >> > 11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, >> >> > std::vector<OSDOp, >> >> > std::allocator<OSDOp> >&)+0x4171) [0x809f21] >> >> > 12: >> >> > (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62) >> >> > [0x814622] >> >> > 13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) >> >> > [0x815098] >> >> > 14: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3dd4) >> >> > [0x81a3f4] >> >> > 15: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, >> >> > ThreadPool::TPHandle&)+0x66d) [0x7b4ecd] >> >> > 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> >> > std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a5) >> >> > [0x600ee5] >> >> > 17: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, >> >> > ThreadPool::TPHandle&)+0x203) [0x61cba3] >> >> > 18: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, >> >> > std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >> >> >>::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c] >> >> > 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0] >> >> > 20: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0] >> >> > 21: (()+0x7dc5) [0x7fd9ad03edc5] >> >> > 22: (clone()+0x6d) [0x7fd9abd2828d] >> >> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> >> > needed to >> >> > interpret this. >> >> > >> >> > --- logging levels --- >> >> > 0/ 5 none >> >> > 0/ 1 lockdep >> >> > 0/ 1 context >> >> > 1/ 1 crush >> >> > 1/ 5 mds >> >> > 1/ 5 mds_balancer >> >> > 1/ 5 mds_locker >> >> > 1/ 5 mds_log >> >> > 1/ 5 mds_log_expire >> >> > 1/ 5 mds_migrator >> >> > 0/ 1 buffer >> >> > 0/ 1 timer >> >> > 0/ 1 filer >> >> > 0/ 1 striper >> >> > 0/ 1 objecter >> >> > 0/ 5 rados >> >> > 0/ 5 rbd >> >> > 0/ 5 journaler >> >> > 0/ 5 objectcacher >> >> > 0/ 5 client >> >> > 0/ 5 osd >> >> > 0/ 5 optracker >> >> > 0/ 5 objclass >> >> > 1/ 3 filestore >> >> > 1/ 3 keyvaluestore >> >> > 1/ 3 journal >> >> > 0/ 5 ms >> >> > 1/ 5 mon >> >> > 0/10 monc >> >> > 1/ 5 paxos >> >> > 0/ 5 tp >> >> > 1/ 5 auth >> >> > 1/ 5 crypto >> >> > 1/ 1 finisher >> >> > 1/ 5 heartbeatmap >> >> > 1/ 5 perfcounter >> >> > 1/ 5 rgw >> >> > 1/10 civetweb >> >> > 1/ 5 javaclient >> >> > 1/ 5 asok >> >> > 1/ 1 throttle >> >> > -2/-2 (syslog threshold) >> >> > -1/-1 (stderr threshold) >> >> > max_recent 10000 >> >> > max_new 1000 >> >> > log_file /var/log/ceph/ceph-osd.3.log >> >> > --- end dump of recent events --- >> >> > >> >> > -- >> >> > Alexander Gubanov >> >> > >> >> > _______________________________________________ >> >> > ceph-users mailing list >> >> > ceph-users@lists.ceph.com >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > >> > >> > >> > >> > >> > -- >> > Alexander Gubanov >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > > > -- > Alexander Gubanov > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Email: shin...@linux.com GitHub: shinobu-x Blog: Life with Distributed Computational System based on OpenSource _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com