You need to get your OSD back online. 



From: "Jeffrey McDonald" <jmcdo...@umn.edu> 
To: ceph-users@lists.ceph.com 
Sent: Saturday, February 6, 2016 8:18:06 AM 
Subject: [ceph-users] CEPH health issues 

Hi, 
I'm seeing lots of issues with my CEPH installation. The health of the system 
is degraded and many of the OSD are down. 

# ceph -v 
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 

# ceph health 
HEALTH_ERR 2002 pgs degraded; 14 pgs down; 180 pgs inconsistent; 14 pgs 
peering; 1 pgs stale; 2002 pgs stuck degraded; 14 pgs stuck inactive; 1 pgs 
stuck stale; 2320 pgs stuck unclean; 2002 pgs stuck undersized; 2002 pgs 
undersized; 100 requests are blocked > 32 sec; recovery 38033332/531925830 
objects degraded (7.150%); recovery 48881596/531925830 objects misplaced 
(9.190%); 12623 scrub errors; 11/320 in osds are down; noout flag(s) set 

Log for one of the down OSDes shows: 

-5> 2016-02-05 19:10:45.294873 7fd4d58e4700 1 -- 10.31.0.3:6835/157558 --> 
10.31.0.5:0/3796 -- osd_ping(ping_reply e144138 stamp 2016-02-05 
19:10:45.286934) v2 -- ?+ 
0 0x4359a00 con 0x2bc9ac60 
-4> 2016-02-05 19:10:45.294915 7fd4d70e7700 1 -- 10.31.0.67:6835/157558 --> 
10.31.0.5:0/3796 -- osd_ping(ping_reply e144138 stamp 2016-02-05 
19:10:45.286934) v2 -- ? 
+0 0x27e21800 con 0x2bacd700 
-3> 2016-02-05 19:10:45.341383 7fd4e2ea8700 0 
filestore(/var/lib/ceph/osd/ceph-299) error (39) Directory not empty not 
handled on operation 0x12c88178 (6494115.0.1, 
or op 1, counting from 0) 
-2> 2016-02-05 19:10:45.341477 7fd4e2ea8700 0 
filestore(/var/lib/ceph/osd/ceph-299) ENOTEMPTY suggests garbage data in osd 
data dir 
-1> 2016-02-05 19:10:45.341493 7fd4e2ea8700 0 
filestore(/var/lib/ceph/osd/ceph-299) transaction dump: 
{ 
"ops": [ 
{ 
"op_num": 0, 
"op_name": "remove", 
"collection": "70.532s3_head", 
"oid": "532\/\/head\/\/70\/18446744073709551615\/3" 
}, 
{ 
"op_num": 1, 
"op_name": "rmcoll", 
"collection": "70.532s3_head" 
} 
] 
} 

0> 2016-02-05 19:10:45.343794 7fd4e2ea8700 -1 os/FileStore.cc: In function 
'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, 
int, ThreadP 
ool::TPHandle*)' thread 7fd4e2ea8700 time 2016-02-05 19:10:45.341673 
os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error") 

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) 
[0xbc60eb] 
2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
ThreadPool::TPHandle*)+0xa52) [0x923d12] 
3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, 
std::allocator<ObjectStore::Transaction*> >&, unsigned long, 
ThreadPool::TPHandle*)+0x64) [0x92a3a4] 
4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) 
[0x92a52a] 
5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 
6: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 
7: (()+0x8182) [0x7fd4ef916182] 
8: (clone()+0x6d) [0x7fd4ede8147d] 
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this. 

--- logging levels --- 
0/ 5 none 
0/ 1 lockdep 
0/ 1 context 
1/ 1 crush 
1/ 5 mds 
1/ 5 mds_balancer 
1/ 5 mds_locker 
1/ 5 mds_log 
1/ 5 mds_log_expire 
1/ 5 mds_migrator 
0/ 1 buffer 
0/ 1 timer 
0/ 1 filer 
0/ 1 striper 
0/ 1 objecter 
0/ 5 rados 
0/ 5 rbd 
0/ 5 rbd_replay 
0/ 5 journaler 
0/ 5 objectcacher 
0/ 5 client 
0/ 5 osd 
0/ 5 optracker 
0/ 5 objclass 
1/ 3 filestore 
1/ 3 keyvaluestore 
1/ 3 journal 
0/ 5 ms 
1/ 5 mon 
0/10 monc 
1/ 5 paxos 
0/ 5 tp 
1/ 5 auth 
1/ 5 crypto 
1/ 1 finisher 
1/ 5 heartbeatmap 
1/ 5 perfcounter 
1/ 5 rgw 
1/10 civetweb 
1/ 5 javaclient 
1/ 5 asok 
1/ 1 throttle 
0/ 0 refs 
1/ 5 xio 
-2/-2 (syslog threshold) 
-1/-1 (stderr threshold) 
max_recent 10000 
max_new 1000 
log_file /var/log/ceph/ceph-osd.299.log 
--- end dump of recent events --- 
2016-02-05 19:10:45.441428 7fd4e2ea8700 -1 *** Caught signal (Aborted) ** 
in thread 7fd4e2ea8700 

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 
1: /usr/bin/ceph-osd() [0xacd7ba] 
2: (()+0x10340) [0x7fd4ef91e340] 
3: (gsignal()+0x39) [0x7fd4eddbdcc9] 
4: (abort()+0x148) [0x7fd4eddc10d8] 
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fd4ee6c8535] 
6: (()+0x5e6d6) [0x7fd4ee6c66d6] 
7: (()+0x5e703) [0x7fd4ee6c6703] 
8: (()+0x5e922) [0x7fd4ee6c6922] 
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) 
[0xbc62d8] 
10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
ThreadPool::TPHandle*)+0xa52) [0x923d12] 
11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, 
std::allocator<ObjectStore::Transaction*> >&, unsigned long, 
ThreadPool::TPHandle*)+0x64) [0x92a3a4 
] 
12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) 
[0x92a52a] 
13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 
14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 
15: (()+0x8182) [0x7fd4ef916182] 
16: (clone()+0x6d) [0x7fd4ede8147d] 
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this. 

--- begin dump of recent events --- 
-4> 2016-02-05 19:10:45.355813 7fd4d58e4700 1 -- 10.31.0.3:6835/157558 <== 
osd.1 10.31.0.101:0/197780 23431 ==== osd_ping(ping e144138 stamp 2016-02-05 
19:10:45.3440 
20) v2 ==== 47+0+0 (1893056775 0 0) 0x36782a00 con 0x2c6c8580 
-3> 2016-02-05 19:10:45.355853 7fd4d58e4700 1 -- 10.31.0.3:6835/157558 --> 
10.31.0.101:0/197780 -- osd_ping(ping_reply e144138 stamp 2016-02-05 
19:10:45.344020) v2 - 
- ?+0 0x29702800 con 0x2c6c8580 
-2> 2016-02-05 19:10:45.356076 7fd4d70e7700 1 -- 10.31.0.67:6835/157558 <== 
osd.1 10.31.0.101:0/197780 23431 ==== osd_ping(ping e144138 stamp 2016-02-05 
19:10:45.344 
020) v2 ==== 47+0+0 (1893056775 0 0) 0x2cf84200 con 0x2bc9c260 
-1> 2016-02-05 19:10:45.356627 7fd4d70e7700 1 -- 10.31.0.67:6835/157558 --> 
10.31.0.101:0/197780 -- osd_ping(ping_reply e144138 stamp 2016-02-05 
19:10:45.344020) v2 
-- ?+0 0x2f5cae00 con 0x2bc9c260 
0> 2016-02-05 19:10:45.441428 7fd4e2ea8700 -1 *** Caught signal (Aborted) ** 
in thread 7fd4e2ea8700 

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 
1: /usr/bin/ceph-osd() [0xacd7ba] 
2: (()+0x10340) [0x7fd4ef91e340] 
3: (gsignal()+0x39) [0x7fd4eddbdcc9] 
4: (abort()+0x148) [0x7fd4eddc10d8] 
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fd4ee6c8535] 
6: (()+0x5e6d6) [0x7fd4ee6c66d6] 
7: (()+0x5e703) [0x7fd4ee6c6703] 
8: (()+0x5e922) [0x7fd4ee6c6922] 
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) 
[0xbc62d8] 
10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
ThreadPool::TPHandle*)+0xa52) [0x923d12] 
11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, 
std::allocator<ObjectStore::Transaction*> >&, unsigned long, 
ThreadPool::TPHandle*)+0x64) [0x92a3a4 
] 
12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) 
[0x92a52a] 
13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 
14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 
15: (()+0x8182) [0x7fd4ef916182] 
16: (clone()+0x6d) [0x7fd4ede8147d] 
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this. 

--- logging levels --- 
0/ 5 none 
0/ 1 lockdep 
0/ 1 context 
1/ 1 crush 
1/ 5 mds 
1/ 5 mds_balancer 
1/ 5 mds_locker 
1/ 5 mds_log 
1/ 5 mds_log_expire 
1/ 5 mds_migrator 
0/ 1 buffer 
0/ 1 timer 
0/ 1 filer 
0/ 1 striper 
0/ 1 objecter 
0/ 5 rados 
0/ 5 rbd 
0/ 5 rbd_replay 
0/ 5 journaler 
0/ 5 objectcacher 
0/ 5 client 
0/ 5 osd 
0/ 5 optracker 
0/ 5 objclass 
1/ 3 filestore 
1/ 3 keyvaluestore 
1/ 3 journal 
0/ 5 ms 
1/ 5 mon 
0/10 monc 
1/ 5 paxos 
0/ 5 tp 
1/ 5 auth 
1/ 5 crypto 
1/ 1 finisher 
1/ 5 heartbeatmap 
1/ 5 perfcounter 
1/ 5 rgw 
1/10 civetweb 
1/ 5 javaclient 
1/ 5 asok 
1/ 1 throttle 
0/ 0 refs 
1/ 5 xio 
-2/-2 (syslog threshold) 
-1/-1 (stderr threshold) 
max_recent 10000 
max_new 1000 
log_file /var/log/ceph/ceph-osd.299.log 

------------------------- 


This log is similar on other OSDs, would this be the best procedure to repair 
the OSDs: http://tracker.ceph.com/issues/12428 ? 

Thanks, 
Jeff 




-- 
Jeffrey McDonald, PhD
Assistant Director for HPC Operations
Minnesota Supercomputing Institute
University of Minnesota Twin Cities
599 Walter Library           email: jeffrey.mcdon...@msi.umn.edu 117 Pleasant 
St SE           phone: +1 612 625-6905
Minneapolis, MN 55455        fax:   +1 612 624-8861 


_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to