Re: [ceph-users] Health Error : Request Stuck

Nick Fisk Wed, 13 Dec 2017 04:24:11 -0800

Ok, great glad you got your issue sorted. I’m still battling along with mine.

From: Karun Josy [mailto:karunjo...@gmail.com] 
Sent: 13 December 2017 12:22
To: n...@fisk.me.uk
Cc: ceph-users <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Health Error : Request Stuck

Hi Nick,

Finally, was able to correct the issue!

We found that there were many slow requests in ceph health detail. 
And found that some osds were slowing the cluster down.

Initially the cluster was unusable when there were 10 PGs with 
"activating+remapped" status and slow requests.

Slow requests were mainly on 2 osds. And we restarted osd daemons one by one, 
which cleared the block requests.

And that made the cluster reusable. However, there were 4 PGs still in inactive 
state.

So I took down one of the osd with slow requests for some time, and allowed the 
cluster to rebalance.

And it worked!

To be honest, not exactly sure its the correct way. 

P.S : I had upgraded to Luminous 12.2.2 yesterday. 

Karun Josy

On Wed, Dec 13, 2017 at 4:31 PM, Nick Fisk <n...@fisk.me.uk 
<mailto:n...@fisk.me.uk> > wrote:

Hi Karun,

I too am experiencing something very similar with a PG stuck in 
activating+remapped state after re-introducing a OSD back into the cluster as 
Bluestore. Although this new OSD is not the one listed against the PG’s stuck 
activating. I also see the same thing as you where the up set is different to 
the acting set.

Can I just ask what ceph version you are running and the output of ceph osd 
tree?

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
<mailto:ceph-users-boun...@lists.ceph.com> ] On Behalf Of Karun Josy
Sent: 13 December 2017 07:06
To: ceph-users <ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
Subject: Re: [ceph-users] Health Error : Request Stuck

Cluster is unusable because of inactive PGs. How can we correct it?

=============

ceph pg dump_stuck inactive

ok

PG_STAT STATE               UP           UP_PRIMARY ACTING       ACTING_PRIMARY

1.4b    activating+remapped [5,2,0,13,1]          5 [5,2,13,1,4]              5

1.35    activating+remapped [2,7,0,1,12]          2 [2,7,1,12,9]              2

1.12    activating+remapped  [1,3,5,0,7]          1  [1,3,5,7,2]              1

1.4e    activating+remapped  [1,3,0,9,2]          1  [1,3,0,9,5]              1

2.3b    activating+remapped     [13,1,0]         13     [13,1,2]             13

1.19    activating+remapped [2,13,8,9,0]          2 [2,13,8,9,1]              2

1.1e    activating+remapped [2,3,1,10,0]          2 [2,3,1,10,5]              2

2.29    activating+remapped     [1,0,13]          1     [1,8,11]              1

1.6f    activating+remapped [8,2,0,4,13]          8 [8,2,4,13,1]              8

1.74    activating+remapped [7,13,2,0,4]          7 [7,13,2,4,1]              7

====

Karun Josy

On Wed, Dec 13, 2017 at 8:27 AM, Karun Josy <karunjo...@gmail.com 
<mailto:karunjo...@gmail.com> > wrote:

Hello,

We added a new disk to the cluster and while rebalancing we are getting error 
warnings.

=============

Overall status: HEALTH_ERR

REQUEST_SLOW: 1824 slow requests are blocked > 32 sec

REQUEST_STUCK: 1022 stuck requests are blocked > 4096 sec

==============

The load in the servers seems to be very low.

How can I correct it?

Karun

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Health Error : Request Stuck

Reply via email to