Hi, several osds were down/out with similar logs as below, could you help?
-38> 2014-06-05 10:27:54.700832 7f2ceead6700 1 -- 192.168.40.11:6800/19542 <== osd.11 192.168.40.11:6822/20298 2 ==== pg_notify(0.aa4(2) epoch 7) v5 ==== 812+0+0 (3873498789 0 0) 0x57a0540 con 0x49d14a0 -37> 2014-06-05 10:27:54.700874 7f2ceead6700 5 -- op tracker -- , seq: 2028, time: 2014-06-05 10:27:54.396463, event: header_read, request: pg_notify(0.aa4(2) epoch 7) v5 -36> 2014-06-05 10:27:54.701091 7f2ceead6700 5 -- op tracker -- , seq: 2028, time: 2014-06-05 10:27:54.396465, event: throttled, request: pg_notify(0.aa4(2) epoch 7) v5 -35> 2014-06-05 10:27:54.701126 7f2ceead6700 5 -- op tracker -- , seq: 2028, time: 2014-06-05 10:27:54.396543, event: all_read, request: pg_notify(0.aa4(2) epoch 7) v5 -34> 2014-06-05 10:27:54.701187 7f2ceead6700 5 -- op tracker -- , seq: 2028, time: 2014-06-05 10:27:54.700870, event: dispatched, request: pg_notify(0.aa4(2) epoch 7) v5 -33> 2014-06-05 10:27:54.701235 7f2ceead6700 5 -- op tracker -- , seq: 2028, time: 2014-06-05 10:27:54.701234, event: waiting_for_osdmap, request: pg_notify(0.aa4(2) epoch 7) v5 -32> 2014-06-05 10:27:54.701267 7f2ceead6700 5 -- op tracker -- , seq: 2028, time: 2014-06-05 10:27:54.701267, event: started, request: pg_notify(0.aa4(2) epoch 7) v5 -31> 2014-06-05 10:27:54.700940 7f2ce92cb700 5 osd.0 pg_epoch: 7 pg[0.1dc1( empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [69,56,21] r=-1 lpr=7 pi=3-6/1 crt=0'0 inactive NOTIFY] exit Start 0.003321 0 0.000000 -30> 2014-06-05 10:27:54.702163 7f2ceead6700 5 -- op tracker -- , seq: 2028, time: 2014-06-05 10:27:54.702163, event: done, request: pg_notify(0.aa4(2) epoch 7) v5 -29> 2014-06-05 10:27:54.701690 7f2ce92cb700 5 osd.0 pg_epoch: 7 pg[0.1dc1( empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [69,56,21] r=-1 lpr=7 pi=3-6/1 crt=0'0 inactive NOTIFY] enter Started/Stray -28> 2014-06-05 10:27:54.697805 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 3/3/3) [0,100,64] r=0 lpr=3 crt=0'0 mlcod 0'0 active] exit Started/Primary/Active 7.527967 0 0.000000 -27> 2014-06-05 10:27:54.702437 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 3/3/3) [0,100,64] r=0 lpr=3 crt=0'0 mlcod 0'0 active] exit Started/Primary 8.986091 0 0.000000 -26> 2014-06-05 10:27:54.702467 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 3/3/3) [0,100,64] r=0 lpr=3 crt=0'0 mlcod 0'0 active] exit Started 8.986256 0 0.000000 -25> 2014-06-05 10:27:54.702486 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 3/3/3) [0,100,64] r=0 lpr=3 crt=0'0 mlcod 0'0 active] enter Reset -24> 2014-06-05 10:27:54.702822 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1 crt=0'0 inactive NOTIFY] exit Reset 0.000336 1 0.005574 -23> 2014-06-05 10:27:54.702900 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1 crt=0'0 inactive NOTIFY] enter Started -22> 2014-06-05 10:27:54.703025 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1 crt=0'0 inactive NOTIFY] enter Start -21> 2014-06-05 10:27:54.703235 7f2ce8aca700 1 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1 crt=0'0 inactive NOTIFY] state<Start>: transitioning to Stray -20> 2014-06-05 10:27:54.704919 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1 crt=0'0 inactive NOTIFY] exit Start 0.001894 0 0.000000 -19> 2014-06-05 10:27:54.704835 7f2ceead6700 1 -- 192.168.40.11:6800/19542 <== osd.15 192.168.40.11:6830/20581 1 ==== pg_notify(2.60(2) epoch 7) v5 ==== 812+0+0 (3557830937 0 0) 0x4a6f540 con 0x3e3bb20 -18> 2014-06-05 10:27:54.705050 7f2ceead6700 5 -- op tracker -- , seq: 2029, time: 2014-06-05 10:27:54.598458, event: header_read, request: pg_notify(2.60(2) epoch 7) v5 -17> 2014-06-05 10:27:54.705079 7f2ceead6700 5 -- op tracker -- , seq: 2029, time: 2014-06-05 10:27:54.598461, event: throttled, request: pg_notify(2.60(2) epoch 7) v5 -16> 2014-06-05 10:27:54.705124 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98( empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1 crt=0'0 inactive NOTIFY] enter Started/Stray -15> 2014-06-05 10:27:54.705353 7f2ceead6700 5 -- op tracker -- , seq: 2029, time: 2014-06-05 10:27:54.598689, event: all_read, request: pg_notify(2.60(2) epoch 7) v5 -14> 2014-06-05 10:27:54.706925 7f2ceead6700 5 -- op tracker -- , seq: 2029, time: 2014-06-05 10:27:54.705046, event: dispatched, request: pg_notify(2.60(2) epoch 7) v5 -13> 2014-06-05 10:27:54.707038 7f2ceead6700 5 -- op tracker -- , seq: 2029, time: 2014-06-05 10:27:54.707037, event: waiting_for_osdmap, request: pg_notify(2.60(2) epoch 7) v5 -12> 2014-06-05 10:27:54.707075 7f2ceead6700 5 -- op tracker -- , seq: 2029, time: 2014-06-05 10:27:54.707075, event: started, request: pg_notify(2.60(2) epoch 7) v5 -11> 2014-06-05 10:27:54.708042 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(unlocked)] enter Initial -10> 2014-06-05 10:27:54.709885 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=0 pi=1-6/2 crt=0'0 mlcod 0'0 inactive] exit Initial 0.001842 0 0.000000 -9> 2014-06-05 10:27:54.711556 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=0 pi=1-6/2 crt=0'0 mlcod 0'0 inactive] enter Reset -8> 2014-06-05 10:27:54.711606 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2 crt=0'0 mlcod 0'0 inactive] exit Reset 0.000051 1 0.001718 -7> 2014-06-05 10:27:54.711648 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2 crt=0'0 mlcod 0'0 inactive] enter Started -6> 2014-06-05 10:27:54.711714 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2 crt=0'0 mlcod 0'0 inactive] enter Start -5> 2014-06-05 10:27:54.711781 7f2ceead6700 1 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2 crt=0'0 mlcod 0'0 inactive] state<Start>: transitioning to Primary -4> 2014-06-05 10:27:54.713010 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2 crt=0'0 mlcod 0'0 inactive] exit Start 0.001295 0 0.000000 -3> 2014-06-05 10:27:54.713164 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2 crt=0'0 mlcod 0'0 inactive] enter Started/Primary -2> 2014-06-05 10:27:54.713234 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2 crt=0'0 mlcod 0'0 inactive] enter Started/Primary/Peering -1> 2014-06-05 10:27:54.713319 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60( empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2 crt=0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetInfo 0> 2014-06-05 10:27:54.711223 7f2ce92cb700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f2ce92cb700 time 2014-06-05 10:27:54.703693 common/Thread.cc: 110: FAILED assert(ret == 0) ceph version 0.80 (b78644e7dee100e48dfeca32c9270a6b210d3003) 1: (Thread::create(unsigned long)+0x8a) [0xa82dea] 2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*, Message*)+0x17b) [0xa2b01b] 3: (SimpleMessenger::get_connection(entity_inst_t const&)+0x180) [0xa2f6c0] 4: (OSDService::get_con_osd_cluster(int, unsigned int)+0x1d9) [0x6061c9] 5: (OSD::compat_must_dispatch_immediately(PG*)+0x296) [0x6064a6] 6: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x4a6) [0x6475d6] 7: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x16) [0x69dbf6] 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0xab4581] 9: (ThreadPool::WorkThread::entry()+0x10) [0xab75c0] 10: /lib64/libpthread.so.0() [0x3eebc079d1] 11: (clone()+0x6d) [0x3eeb4e8b6d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.0.log --- end dump of recent events --- 2014-06-05 10:27:55.096296 7f2ca8628700 0 -- 192.168.40.11:6800/19542 >> 192.168.40.11:6808/19811 pipe(0x4adc100 sd=629 :6800 s=2 pgs=111 cs=1 l=0 c=0x4a0c780).fault with nothing to send, going to standby 2014-06-05 10:27:55.129281 7f2ce92cb700 -1 *** Caught signal (Aborted) ** in thread 7f2ce92cb700 ceph version 0.80 (b78644e7dee100e48dfeca32c9270a6b210d3003) 1: /usr/bin/ceph-osd() [0x9aa211] 2: /lib64/libpthread.so.0() [0x3eebc0f710] 3: (gsignal()+0x35) [0x3eeb432925] 4: (abort()+0x175) [0x3eeb434105] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3eed8bea5d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -6> 2014-06-05 10:27:54.716422 7f2cec2d1700 1 -- 192.168.40.11:6801/19542 <== osd.119 192.168.40.14:0/29977 3 ==== osd_ping(ping e7 stamp 2014-06-05 10:27:54.713281) v2 ==== 47+0+0 (243568413 0 0) 0x4a68380 con 0x3e3c200 -5> 2014-06-05 10:27:54.716480 7f2cec2d1700 1 -- 192.168.40.11:6801/19542 --> 192.168.40.14:0/29977 -- osd_ping(ping_reply e7 stamp 2014-06-05 10:27:54.713281) v2 -- ?+0 0x5436740 con 0x3e3c200 -4> 2014-06-05 10:27:54.717351 7f2ced2d3700 1 -- 192.168.50.11:6801/19542 <== osd.119 192.168.40.14:0/29977 3 ==== osd_ping(ping e7 stamp 2014-06-05 10:27:54.713281) v2 ==== 47+0+0 (243568413 0 0) 0x4a68fc0 con 0x3e3c8e0 -3> 2014-06-05 10:27:55.096211 7f2ca8628700 2 -- 192.168.40.11:6800/19542 >> 192.168.40.11:6808/19811 pipe(0x4adc100 sd=629 :6800 s=2 pgs=111 cs=1 l=0 c=0x4a0c780).reader couldn't read tag, (0) Success -2> 2014-06-05 10:27:55.096258 7f2ca8628700 2 -- 192.168.40.11:6800/19542 >> 192.168.40.11:6808/19811 pipe(0x4adc100 sd=629 :6800 s=2 pgs=111 cs=1 l=0 c=0x4a0c780).fault (0) Success -1> 2014-06-05 10:27:55.096296 7f2ca8628700 0 -- 192.168.40.11:6800/19542 >> 192.168.40.11:6808/19811 pipe(0x4adc100 sd=629 :6800 s=2 pgs=111 cs=1 l=0 c=0x4a0c780).fault with nothing to send, going to standby 0> 2014-06-05 10:27:55.129281 7f2ce92cb700 -1 *** Caught signal (Aborted) ** in thread 7f2ce92cb700 ceph version 0.80 (b78644e7dee100e48dfeca32c9270a6b210d3003) 1: /usr/bin/ceph-osd() [0x9aa211] 2: /lib64/libpthread.so.0() [0x3eebc0f710] 3: (gsignal()+0x35) [0x3eeb432925] 4: (abort()+0x175) [0x3eeb434105] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3eed8bea5d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.0.log --- end dump of recent events --- Wei Cao (Buddy)
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com