----- Message from Gregory Farnum <g...@inktank.com> ---------
   Date: Mon, 15 Sep 2014 10:37:07 -0700
   From: Gregory Farnum <g...@inktank.com>
Subject: Re: [ceph-users] OSD troubles on FS+Tiering
     To: Kenneth Waegeman <kenneth.waege...@ugent.be>
     Cc: ceph-users <ceph-users@lists.ceph.com>


The pidfile bug is already fixed in master/giant branches.

As for the crashing, I'd try killing all the osd processes and turning
them back on again. It might just be some daemon restart failed, or
your cluster could be sufficiently overloaded that the node disks are
going unresponsive and they're suiciding, or...

I restarted them that way, and they eventually got clean again.
'ceph status' printed that 'ecdata' pool had too few pgs, so I changed the amount of pgs from 128 to 256 (with EC k+m=11)
After a few minutes I checked the cluster state again:

[root@ceph001 ~]# ceph status
    cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
health HEALTH_WARN 100 pgs down; 155 pgs peering; 81 pgs stale; 240 pgs stuck inactive; 81 pgs stuck stale; 240 pgs stuck unclean; 746 requests are blocked > 32 sec; 'cache' at/near target max; pool ecdata pg_num 256 > pgp_num 128 monmap e1: 3 mons at {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2 ceph001,ceph002,ceph003
     mdsmap e6993: 1/1/1 up {0=ceph003=up:active}, 2 up:standby
     osdmap e11023: 48 osds: 14 up, 14 in
      pgmap v160466: 1472 pgs, 4 pools, 3899 GB data, 2374 kobjects
            624 GB used, 7615 GB / 8240 GB avail
                  75 creating
                1215 active+clean
                 100 down+peering
                   1 active+clean+scrubbing
                  10 stale
                  16 stale+active+clean

Again 34 OSDS are down.. This time I have the error log, I checked a few osd logs :

I checked the first host that was marked down:

-17> 2014-09-16 13:27:49.962938 7f5dfe6a3700 5 osd.7 pg_epoch: 8912 pg[2.b0s3(unlocked)] enter Initial -16> 2014-09-16 13:27:50.008842 7f5e02eac700 1 -- 10.143.8.180:6833/53810 <== osd.30 10.141.8.181:0/37396 2524 ==== osd_ping(ping e8912 stamp 2014-09-16 13:27:50.008514) v2 ==== 47+0+0 (3868888299 0 0) 0x18ef7080 con 0x6961600 -15> 2014-09-16 13:27:50.008892 7f5e02eac700 1 -- 10.143.8.180:6833/53810 --> 10.141.8.181:0/37396 -- osd_ping(ping_reply e8912 stamp 2014-09-16 13:27:50.008514) v2 -- ?+0 0x7326900 con 0x6961600 -14> 2014-09-16 13:27:50.009159 7f5e046af700 1 -- 10.141.8.180:6847/53810 <== osd.30 10.141.8.181:0/37396 2524 ==== osd_ping(ping e8912 stamp 2014-09-16 13:27:50.008514) v2 ==== 47+0+0 (3868888299 0 0) 0x2210a760 con 0xadd0420 -13> 2014-09-16 13:27:50.009202 7f5e046af700 1 -- 10.141.8.180:6847/53810 --> 10.141.8.181:0/37396 -- osd_ping(ping_reply e8912 stamp 2014-09-16 13:27:50.008514) v2 -- ?+0 0x14e35a00 con 0xadd0420 -12> 2014-09-16 13:27:50.034378 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] exit Reset 0.127612 1 0.000123 -11> 2014-09-16 13:27:50.034432 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] enter Started -10> 2014-09-16 13:27:50.034452 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] enter Start -9> 2014-09-16 13:27:50.034469 7f5dfeea4700 1 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] state<Start>: transitioning to Stray -8> 2014-09-16 13:27:50.034491 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] exit Start 0.000038 0 0.000000 -7> 2014-09-16 13:27:50.034521 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] enter Started/Stray -6> 2014-09-16 13:27:50.034664 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 pi=104-813/36 luod=0'0 crt=8885'35261 active] exit Started/ReplicaActive/RepNotRecovering 7944.878905 22472 0.038180 -5> 2014-09-16 13:27:50.034689 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 pi=104-813/36 luod=0'0 crt=8885'35261 active] exit Started/ReplicaActive 7944.878946 0 0.000000 -4> 2014-09-16 13:27:50.034711 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 pi=104-813/36 luod=0'0 crt=8885'35261 active] exit Started 7945.924923 0 0.000000 -3> 2014-09-16 13:27:50.034732 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 pi=104-813/36 luod=0'0 crt=8885'35261 active] enter Reset -2> 2014-09-16 13:27:50.034869 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.87s10(unlocked)] enter Initial
    -1> 2014-09-16 13:27:50.042055 7f5e11981700  5 osd.7 8912 tick
0> 2014-09-16 13:27:50.045856 7f5e1015f700 -1 *** Caught signal (Aborted) **
 in thread 7f5e1015f700

 ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187)
 1: /usr/bin/ceph-osd() [0xa72096]
 2: (()+0xf130) [0x7f5e193d7130]
 3: (gsignal()+0x39) [0x7f5e17dd5989]
 4: (abort()+0x148) [0x7f5e17dd7098]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5e186e89d5]
 6: (()+0x5e946) [0x7f5e186e6946]
 7: (()+0x5e973) [0x7f5e186e6973]
 8: (()+0x5eb9f) [0x7f5e186e6b9f]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ef) [0xb5e58f] 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x45e) [0xa3b1ee] 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x274) [0x9042d4] 12: (KeyValueStore::_split_collection(coll_t, unsigned int, unsigned int, coll_t, KeyValueStore::BufferTransaction&)+0x421) [0x91f091] 13: (KeyValueStore::_do_transaction(ObjectStore::Transaction&, KeyValueStore::BufferTransaction&, ThreadPool::TPHandle*)+0xa4c) [0x920f2c] 14: (KeyValueStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x13f) [0x92385f] 15: (KeyValueStore::_do_op(KeyValueStore::OpSequencer*, ThreadPool::TPHandle&)+0xac) [0x923a7c]
 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb10) [0xb4ef50]
 17: (ThreadPool::WorkThread::entry()+0x10) [0xb50040]
 18: (()+0x7df3) [0x7f5e193cfdf3]
 19: (clone()+0x6d) [0x7f5e17e963dd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


On some hosts that followed the crashing :

=0 cs=0 l=1 c=0x139ed280).connect error 10.141.8.181:6822/36408, (111) Connection refused -7> 2014-09-16 13:28:36.028858 7f74eb007700 2 -- 10.141.8.180:0/52318 >> 10.141.8.181:6822/36408 pipe(0x18147bc0 sd=41 :0 s=1 pgs
=0 cs=0 l=1 c=0x139ed280).fault (111) Connection refused
-6> 2014-09-16 13:28:36.029423 7f74e4c96700 2 -- 10.141.8.180:0/52318 >> 10.143.8.181:6815/36408 pipe(0x18147640 sd=81 :0 s=1 pgs =0 cs=0 l=1 c=0x139e91e0).connect error 10.143.8.181:6815/36408, (111) Connection refused -5> 2014-09-16 13:28:36.029443 7f74e4c96700 2 -- 10.141.8.180:0/52318 >> 10.143.8.181:6815/36408 pipe(0x18147640 sd=81 :0 s=1 pgs
=0 cs=0 l=1 c=0x139e91e0).fault (111) Connection refused
-4> 2014-09-16 13:28:36.101914 7f7509534700 1 -- 10.143.8.180:6801/52318 <== osd.32 10.141.8.182:0/54784 2520 ==== osd_ping(ping e8964 stamp 2014-09-16 13:28:36.101604) v2 ==== 47+0+0 (411091961 0 0) 0x189b50a0 con 0x14a0f7a0 -3> 2014-09-16 13:28:36.101952 7f7509534700 1 -- 10.143.8.180:6801/52318 --> 10.141.8.182:0/54784 -- osd_ping(ping_reply e8941 st
amp 2014-09-16 13:28:36.101604) v2 -- ?+0 0x1a0feea0 con 0x14a0f7a0
-2> 2014-09-16 13:28:36.101950 7f750ad37700 1 -- 10.141.8.180:6801/52318 <== osd.32 10.141.8.182:0/54784 2520 ==== osd_ping(ping e8964 stamp 2014-09-16 13:28:36.101604) v2 ==== 47+0+0 (411091961 0 0) 0x1178cce0 con 0x143944c0 -1> 2014-09-16 13:28:36.102005 7f750ad37700 1 -- 10.141.8.180:6801/52318 --> 10.141.8.182:0/54784 -- osd_ping(ping_reply e8941 st
amp 2014-09-16 13:28:36.101604) v2 -- ?+0 0x14b0f440 con 0x143944c0
0> 2014-09-16 13:28:36.183818 7f751681f700 -1 os/GenericObjectMap.cc: In function 'int GenericObjectMap::list_objects(const coll_ t&, ghobject_t, int, std::vector<ghobject_t>*, ghobject_t*)' thread 7f751681f700 time 2014-09-16 13:28:36.181333
os/GenericObjectMap.cc: 1094: FAILED assert(start <= header.oid)

 ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187)
1: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*
)+0x45e) [0xa3b1ee]
2: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t>
 >*, ghobject_t*)+0x274) [0x9042d4]
3: (KeyValueStore::_split_collection(coll_t, unsigned int, unsigned int, coll_t, KeyValueStore::BufferTransaction&)+0x421) [0x91f091] 4: (KeyValueStore::_do_transaction(ObjectStore::Transaction&, KeyValueStore::BufferTransaction&, ThreadPool::TPHandle*)+0xa4c) [0x920
f2c]
5: (KeyValueStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long,
 ThreadPool::TPHandle*)+0x13f) [0x92385f]
6: (KeyValueStore::_do_op(KeyValueStore::OpSequencer*, ThreadPool::TPHandle&)+0xac) [0x923a7c]
 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb10) [0xb4ef50]
 8: (ThreadPool::WorkThread::entry()+0x10) [0xb50040]
 9: (()+0x7df3) [0x7f7520317df3]
 10: (clone()+0x6d) [0x7f751edde3dd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


I tried to restart the crashed OSDS again, but it always fails instantly now with the above stacktrace ..

Any ideas with this?
Thanks a lot!
Kenneth

-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Sep 15, 2014 at 5:43 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:
Hi,

I have some strange OSD problems. Before the weekend I started some rsync
tests over CephFS, on a cache pool with underlying EC KV pool. Today the
cluster is completely degraded:

[root@ceph003 ~]# ceph status
    cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
     health HEALTH_WARN 19 pgs backfill_toofull; 403 pgs degraded; 168 pgs
down; 8 pgs incomplete; 168 pgs peering; 61 pgs stale; 403 pgs stuck
degraded; 176 pgs stuck inactive; 61 pgs stuck stale; 589 pgs stuck unclean;
403 pgs stuck undersized; 403 pgs undersized; 300 requests are blocked > 32
sec; recovery 15170/27902361 objects degraded (0.054%); 1922/27902361
objects misplaced (0.007%); 1 near full osd(s)
     monmap e1: 3 mons at
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0},
election epoch 8, quorum 0,1,2 ceph001,ceph002,ceph003
     mdsmap e5: 1/1/1 up {0=ceph003=up:active}, 2 up:standby
     osdmap e719: 48 osds: 18 up, 18 in
      pgmap v144887: 1344 pgs, 4 pools, 4139 GB data, 2624 kobjects
            2282 GB used, 31397 GB / 33680 GB avail
            15170/27902361 objects degraded (0.054%); 1922/27902361 objects
misplaced (0.007%)
                  68 down+remapped+peering
                   1 active
                 754 active+clean
                   1 stale+incomplete
                   1 stale+active+clean+scrubbing
                  14 active+undersized+degraded+remapped
                   7 incomplete
                 100 down+peering
                   9 active+remapped
                  59 stale+active+undersized+degraded
                  19 active+undersized+degraded+remapped+backfill_toofull
                 311 active+undersized+degraded

I tried to figure out what happened in the global logs:

2014-09-13 08:01:19.433313 mon.0 10.141.8.180:6789/0 66076 : [INF] pgmap
v65892: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used, 126 TB /
129 TB avail; 4159 kB/s wr, 45 op/s
2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF] pgmap
v65893: 1344 pgs: 1344
2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF] pgmap
v65893: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used, 126 TB /
129 TB avail; 561 kB/s wr, 11 op/s
2014-09-13 08:01:20.777988 mon.0 10.141.8.180:6789/0 66081 : [INF] osd.19
10.141.8.181:6809/29664 failed (3 reports from 3 peers after 20.000079 >=
grace 20.000000)
2014-09-13 08:01:21.455887 mon.0 10.141.8.180:6789/0 66083 : [INF] osdmap
e117: 48 osds: 47 up, 48 in
2014-09-13 08:01:21.462084 mon.0 10.141.8.180:6789/0 66084 : [INF] pgmap
v65894: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used, 126 TB /
129 TB avail; 1353 kB/s wr, 13 op/s
2014-09-13 08:01:21.477007 mon.0 10.141.8.180:6789/0 66085 : [INF] pgmap
v65895: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data,
3116 GB used, 126 TB / 129 TB avail; 2300 kB/s wr, 21 op/s
2014-09-13 08:01:22.456055 mon.0 10.141.8.180:6789/0 66086 : [INF] osdmap
e118: 48 osds: 47 up, 48 in
2014-09-13 08:01:22.462590 mon.0 10.141.8.180:6789/0 66087 : [INF] pgmap
v65896: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data,
3116 GB used, 126 TB / 129 TB avail; 13686 kB/s wr, 5 op/s
2014-09-13 08:01:23.464302 mon.0 10.141.8.180:6789/0 66088 : [INF] pgmap
v65897: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data,
3116 GB used, 126 TB / 129 TB avail; 11075 kB/s wr, 4 op/s
2014-09-13 08:01:24.477467 mon.0 10.141.8.180:6789/0 66089 : [INF] pgmap
v65898: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data,
3116 GB used, 126 TB / 129 TB avail; 4932 kB/s wr, 38 op/s
2014-09-13 08:01:25.481027 mon.0 10.141.8.180:6789/0 66090 : [INF] pgmap
v65899: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data,
3116 GB used, 126 TB / 129 TB avail; 5726 kB/s wr, 64 op/s
2014-09-13 08:01:19.336173 osd.1 10.141.8.180:6803/26712 54442 : [WRN] 1
slow requests, 1 included below; oldest blocked for > 30.000137 secs
2014-09-13 08:01:19.336341 osd.1 10.141.8.180:6803/26712 54443 : [WRN] slow
request 30.000137 seconds old, received at 2014-09-13 08:00:49.335339:
osd_op(client.7448.1:17751783 10000203eac.0000000e [write 0~319488
[1@-1],startsync 0~0] 1.b
6c3a3a9 snapc 1=[] ondisk+write e116) currently reached pg
2014-09-13 08:01:20.337602 osd.1 10.141.8.180:6803/26712 54444 : [WRN] 7
slow requests, 6 included below; oldest blocked for > 31.001947 secs
2014-09-13 08:01:20.337688 osd.1 10.141.8.180:6803/26712 54445 : [WRN] slow
request 30.998110 seconds old, received at 2014-09-13 08:00:49.339176:
osd_op(client.7448.1:17751787 10000203eac.0000000e [write 319488~65536
[1@-1],startsync 0~0]


This is happening OSD after OSD..

I tried to check the individual log of the osds, but all the individual logs
stop abruptly (also from the osds that are still running):

2014-09-12 14:25:51.205276 7f3517209700  0 log [WRN] : 41 slow requests, 1
included below; oldest blocked for > 38.118088 secs
2014-09-12 14:25:51.205337 7f3517209700  0 log [WRN] : slow request
36.558286 seconds old, received at 2014-09-12 14:25:14.646836:
osd_op(client.7448.1:2458392 1000006328f.0000000b [write 3989504~204800
[1@-1],startsync 0~0] 1.9337bf4b snapc 1=[] ondisk+write e116) currently
reached pg
2014-09-12 14:25:53.205586 7f3517209700  0 log [WRN] : 30 slow requests, 1
included below; oldest blocked for > 40.118530 secs
2014-09-12 14:25:53.205679 7f3517209700  0 log [WRN] : slow request
30.541026 seconds old, received at 2014-09-12 14:25:22.664538:
osd_op(client.7448.1:2460291 100000632b7.00000000 [write 0~691
[1@-1],startsync 0~0] 1.994248a8 snapc 1=[] ondisk+write e116) currently
reached pg
2014-09-12 17:52:40.503917 7f34e8ed2700  0 -- 10.141.8.181:6809/29664 >>
10.141.8.181:6847/62389 pipe(0x247ce040 sd=327 :6809 s=0 pgs=0 cs=0 l=1
c=0x1bc8b9c0).accept replacing existing (lossy) channel (new one lossy=1)

I *think* the absence of the logs is some issue related to another issue I
just found (http://tracker.ceph.com/issues/9470).

So I can't found out the original problem with the log files..

Is there any other way I can find out what started the crashing of 30 osds ?

Thanks!!

Kenneth

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


----- End message from Gregory Farnum <g...@inktank.com> -----

--

Met vriendelijke groeten,
Kenneth Waegeman


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to