Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state

Sahana Lokeshappa Tue, 23 Sep 2014 22:24:06 -0700

Hi All,

Anyone can help me out here.


Sahana Lokeshappa
Test Development Engineer I


From: Varada Kari
Sent: Monday, September 22, 2014 11:52 PM
To: Sage Weil; Sahana Lokeshappa; ceph-us...@ceph.com; 
ceph-commun...@lists.ceph.com
Subject: RE: [Ceph-community] Pgs are in stale+down+peering state

Hi Sage,

To give more context on this problem,

This cluster has two pools rbd and user-created.

Osd.12 is a primary for some other PG’s , but the problem happens for these 
three  PG’s.

$ sudo ceph osd lspools
0 rbd,2 pool1,

$ sudo ceph -s
    cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
     health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck 
inactive; 3 pgs stuck stale; 3 pgs stuck unclean; 1 requests are blocked > 32 
sec
    monmap e1: 3 mons at 
{rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0},
 election epoch 2008, quorum 0,1,2 rack2-ram-1,rack2-ram-2,rack2-ram-3
     osdmap e17842: 64 osds: 64 up, 64 in
      pgmap v79729: 2148 pgs, 2 pools, 4135 GB data, 1033 kobjects
            12504 GB used, 10971 GB / 23476 GB avail
                2145 active+clean
                   3 stale+down+peering

Snippet from pg dump:

2.a9    518     0       0       0       0       2172649472      3001    3001    
active+clean    2014-09-22 17:49:35.357586      6826'35762      17842:72706     
[12,7,28]       12      [12,7,28]   12       6826'35762      2014-09-22 
11:33:55.985449      0'0     2014-09-16 20:11:32.693864
0.59    0       0       0       0       0       0       0       0       
active+clean    2014-09-22 17:50:00.751218      0'0     17842:4472      
[12,41,2]       12      [12,41,2]       12      0'0 2014-09-22 16:47:09.315499  
     0'0     2014-09-16 12:20:48.618726
0.4d    0       0       0       0       0       0       4       4       
stale+down+peering      2014-09-18 17:51:10.038247      186'4   11134:498       
[12,56,27]      12      [12,56,27]      12  186'4    2014-09-18 17:30:32.393188 
     0'0     2014-09-16 12:20:48.615322
0.49    0       0       0       0       0       0       0       0       
stale+down+peering      2014-09-18 17:44:52.681513      0'0     11134:498       
[12,6,25]       12      [12,6,25]       12  0'0      2014-09-18 17:16:12.986658 
     0'0     2014-09-16 12:20:48.614192
0.1c    0       0       0       0       0       0       12      12      
stale+down+peering      2014-09-18 17:51:16.735549      186'12  11134:522       
[12,25,23]      12      [12,25,23]      12  186'12   2014-09-18 17:16:04.457863 
     186'10  2014-09-16 14:23:58.731465
2.17    510     0       0       0       0       2139095040      3001    3001    
active+clean    2014-09-22 17:52:20.364754      6784'30742      17842:72033     
[12,27,23]      12      [12,27,23]  12       6784'30742      2014-09-22 
00:19:39.905291      0'0     2014-09-16 20:11:17.016299
2.7e8   508     0       0       0       0       2130706432      3433    3433    
active+clean    2014-09-22 17:52:20.365083      6702'21132      17842:64769     
[12,25,23]      12      [12,25,23]  12       6702'21132      2014-09-22 
17:01:20.546126      0'0     2014-09-16 14:42:32.079187
2.6a5   528     0       0       0       0       2214592512      2840    2840    
active+clean    2014-09-22 22:50:38.092084      6775'34416      17842:83221     
[12,58,0]       12      [12,58,0]   12       6775'34416      2014-09-22 
22:50:38.091989      0'0     2014-09-16 20:11:32.703368

And we couldn’t observe and peering events happening on the primary osd.

$ sudo ceph pg 0.49 query
Error ENOENT: i don't have pgid 0.49
$ sudo ceph pg 0.4d query
Error ENOENT: i don't have pgid 0.4d
$ sudo ceph pg 0.1c query
Error ENOENT: i don't have pgid 0.1c

Not able to explain why the peering was stuck. BTW, Rbd pool doesn’t contain 
any data.

Varada

From: Ceph-community [mailto:ceph-community-boun...@lists.ceph.com] On Behalf 
Of Sage Weil
Sent: Monday, September 22, 2014 10:44 PM
To: Sahana Lokeshappa; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>; 
ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>; 
ceph-commun...@lists.ceph.com<mailto:ceph-commun...@lists.ceph.com>
Subject: Re: [Ceph-community] Pgs are in stale+down+peering state


Stale means that the primary OSD for the PG went down and the status is stale.  
They all seem to be from OSD.12... Seems like something is preventing that OSD 
from reporting to the mon?

sage

On September 22, 2014 7:51:48 AM EDT, Sahana Lokeshappa 
<sahana.lokesha...@sandisk.com<mailto:sahana.lokesha...@sandisk.com>> wrote:
Hi all,


I used command  ‘ceph osd thrash ‘ command and after all osds are up and in, 3  
pgs are in  stale+down+peering state


sudo ceph -s
    cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
     health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck 
inactive; 3 pgs stuck stale; 3 pgs stuck unclean
     monmap e1: 3 mons at 
{rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0},
 election epoch 2008, quorum 0,1,2 rack2-ram-1,rack2-ram-2,rack2-ram-3
     osdmap e17031: 64 osds: 64 up, 64 in
      pgmap v76728: 2148 pgs, 2 pools, 4135 GB data, 1033 kobjects
            12501 GB used, 10975 GB / 23476 GB avail
                2145 active+clean
                   3 stale+down+peering


sudo ceph health detail
HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck inactive; 3 pgs 
stuck stale; 3 pgs stuck unclean
pg 0.4d is stuck inactive for 341048.948643, current state stale+down+peering, 
last acting [12,56,27]
pg 0.49 is stuck inactive for 341048.948667, current state stale+down+peering, 
last acting [12,6,25]
pg 0.1c is stuck inactive for 341048.949362, current state stale+down+peering, 
last acting [12,25,23]
pg 0.4d is stuck unclean for 341048.948665, current state stale+down+peering, 
last acting [12,56,27]
pg 0.49 is stuck unclean for 341048.948687, current state stale+down+peering, 
last acting [12,6,25]
pg 0.1c is stuck unclean for 341048.949382, current state stale+down+peering, 
last acting [12,25,23]
pg 0.4d is stuck stale for 339823.956929, current state stale+down+peering, 
last acting [12,56,27]
pg 0.49 is stuck stale for 339823.956930, current state stale+down+peering, 
last acting [12,6,25]
pg 0.1c is stuck stale for 339823.956925, current state stale+down+peering, 
last acting [12,25,23]




Please, can anyone explain why pgs are in this state.
Sahana Lokeshappa
Test Development Engineer I
SanDisk Corporation
3rd Floor, Bagmane Laurel, Bagmane Tech Park
C V Raman nagar, Bangalore 560093
T: +918042422283
sahana.lokesha...@sandisk.com<mailto:sahana.lokesha...@sandisk.com>



________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

________________________________

Ceph-community mailing list
ceph-commun...@lists.ceph.com<mailto:ceph-commun...@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com

--
Sent from Kaiten Mail. Please excuse my brevity.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state

Reply via email to