Any pointers to fix incomplete PG would be grateful

I tried the following with no success. 

pg scrub
pg deep scrub
pg repair
osd out , down , rm , in
osd lost 



# ceph -s
    cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
     health HEALTH_WARN 7 pgs down; 20 pgs incomplete; 1 pgs recovering; 20 pgs 
stuck inactive; 21 pgs stuck unclean; 4 requests are blocked > 32 sec; recovery 
201/986658 objects degraded (0.020%); 133/328886 unfound (0.040%)
     monmap e3: 3 mons at 
{pouta-s01=xx.xx.xx.1:6789/0,pouta-s02=xx.xx.xx.2:6789/0,pouta-s03=xx.xx.xx.3:6789/0},
 election epoch 1920, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03
     osdmap e262813: 239 osds: 239 up, 239 in
      pgmap v588073: 18432 pgs, 13 pools, 2338 GB data, 321 kobjects
            19094 GB used, 849 TB / 868 TB avail
            201/986658 objects degraded (0.020%); 133/328886 unfound (0.040%)
                   7 down+incomplete
               18411 active+clean
                  13 incomplete
                   1 active+recovering



# ceph pg dump_stuck inactive
ok
pg_stat objects mip     degr    unf     bytes   log     disklog state   
state_stamp     v       reported        up      up_primary      acting  
acting_primar   last_scrub      scrub_stamp     last_deep_scrub deep_scrub_stamp
10.70   0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.152179      0'0     262813:163      [213,88,80]     213     
[213,88,80]     213     0'0     2015-03-12 17:59:43.275049      0'0     
2015-03-09 17:55:58.745662
3.dde   68      66      0       66      552861709       297     297     
down+incomplete 2015-04-01 21:21:16.161066      33547'297       262813:230683   
[174,5,179]     174     [174,5,179]     174     33547'297       2015-03-12 
14:19:15.261595      28522'43        2015-03-11 14:19:13.894538
5.a2    0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.145329      0'0     262813:150      [168,182,201]   168     
[168,182,201]   168     0'0     2015-03-12 17:58:29.257085      0'0     
2015-03-09 17:55:07.684377
13.1b6  0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.139062      0'0     262813:2974     [0,176,131]     0       
[0,176,131]     0       0'0     2015-03-12 18:00:13.286920      0'0     
2015-03-09 17:56:18.715208
7.25b   0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.113876      0'0     262813:167      [111,26,108]    111     
[111,26,108]    111     27666'16        2015-03-12 17:59:06.357864      2330'3  
2015-03-09 17:55:30.754522
5.19    0       0       0       0       0       0       0       down+incomplete 
2015-04-01 21:21:16.199712      0'0     262813:27605    [212,43,131]    212     
[212,43,131]    212     0'0     2015-03-12 13:51:37.777026      0'0     
2015-03-11 13:51:35.406246
3.a2f   68      0       0       0       543686693       302     302     
incomplete      2015-04-01 21:21:16.141368      33531'302       262813:3731     
[149,224,33]    149     [149,224,33]    149     33531'302       2015-03-12 
14:17:43.045627      28564'54        2015-03-11 14:17:40.314189
7.298   0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.108523      0'0     262813:166      [221,154,225]   221     
[221,154,225]   221     27666'13        2015-03-12 17:59:10.308423      2330'4  
2015-03-09 17:55:35.750109
1.1e7   0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.192711      0'0     262813:162      [215,232]       215     
[215,232]       215     0'0     2015-03-12 17:55:45.203232      0'0     
2015-03-09 17:53:49.694822
3.774   79      0       0       0       645136397       339     339     
down+incomplete 2015-04-01 21:21:16.207131      33570'339       262813:168986   
[162,39,161]    162     [162,39,161]    162     33570'339       2015-03-12 
14:49:03.869447      2226'2  2015-03-09 13:46:49.783950
3.7d0   78      0       0       0       609222686       376     376     
down+incomplete 2015-04-01 21:21:16.135599      33538'376       262813:185045   
[117,118,177]   117     [117,118,177]   117     33538'376       2015-03-12 
13:51:03.984454      28394'62        2015-03-11 13:50:58.196288
3.d60   0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.158179      0'0     262813:169      [60,56,220]     60      
[60,56,220]     60      33552'321       2015-03-12 13:44:43.502907      
28356'39        2015-03-11 13:44:41.663482
4.1fc   0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.217291      0'0     262813:163      [144,58,153]    144     
[144,58,153]    144     0'0     2015-03-12 17:58:19.254170      0'0     
2015-03-09 17:54:55.720479
3.e02   72      0       0       0       585105425       304     304     
down+incomplete 2015-04-01 21:21:16.099150      33568'304       262813:169744   
[15,102,147]    15      [15,102,147]    15      33568'304       2015-03-16 
10:04:19.894789      2246'4  2015-03-09 11:43:44.176331
8.1d4   0       0       0       0       0       0       0       down+incomplete 
2015-04-01 21:21:16.218644      0'0     262813:21867    [126,43,174]    126     
[126,43,174]    126     0'0     2015-03-12 14:34:35.258338      0'0     
2015-03-12 14:34:35.258338
4.2f4   0       0       0       0       0       0       0       down+incomplete 
2015-04-01 21:21:16.117515      0'0     262813:116150   [181,186,13]    181     
[181,186,13]    181     0'0     2015-03-12 14:59:03.529264      0'0     
2015-03-09 13:46:40.601301
3.e5a   76      70      0       0       623902741       325     325     
incomplete      2015-04-01 21:21:16.043300      33569'325       262813:73426    
[97,22,62]      97      [97,22,62]      97      33569'325       2015-03-12 
13:58:05.813966      28433'44        2015-03-11 13:57:53.909795
8.3a0   0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.056437      0'0     262813:175168   [62,14,224]     62      
[62,14,224]     62      0'0     2015-03-12 13:52:44.546418      0'0     
2015-03-12 13:52:44.546418
3.24e   0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.130831      0'0     262813:165      [39,202,90]     39      
[39,202,90]     39      33556'272       2015-03-13 11:44:41.263725      2327'4  
2015-03-09 17:54:43.675552
5.f7    0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.145298      0'0     262813:153      [54,193,123]    54      
[54,193,123]    54      0'0     2015-03-12 17:58:30.257371      0'0     
2015-03-09 17:55:11.725629
[root@pouta-s01 ceph]#


##########  Example 1 : PG 10.70 ###########


10.70   0       0       0       0       0       0       0       incomplete      
2015-04-01 21:21:16.152179      0'0     262813:163      [213,88,80]     213     
[213,88,80]     213     0'0     2015-03-12 17:59:43.275049      0'0     
2015-03-09 17:55:58.745662


This is how i found location of each OSD

[root@pouta-s01 ceph]# ceph osd find 88

{ "osd": 88,
  "ip": "10.100.50.3:7079\/916853",
  "crush_location": { "host": "pouta-s03",
      "root": "default”}}
[root@pouta-s01 ceph]#


When i manually check current/pg_head directory , data is not present here ( 
i.e. data is lost from all the copies )


[root@pouta-s04 current]# ls -l /var/lib/ceph/osd/ceph-80/current/10.70_head
total 0
[root@pouta-s04 current]#


On some of the OSD’s HEAD directory does not exists 

[root@pouta-s03 ~]# ls -l /var/lib/ceph/osd/ceph-88/current/10.70_head
ls: cannot access /var/lib/ceph/osd/ceph-88/current/10.70_head: No such file or 
directory
[root@pouta-s03 ~]#

[root@pouta-s02 ~]# ls -l /var/lib/ceph/osd/ceph-213/current/10.70_head
total 0
[root@pouta-s02 ~]#


# ceph pg 10.70 query  --->  http://paste.ubuntu.com/10719840/


##########  Example 2 : PG 3.7d0 ###########

3.7d0   78      0       0       0       609222686       376     376     
down+incomplete 2015-04-01 21:21:16.135599      33538'376       262813:185045   
[117,118,177]   117     [117,118,177]   117     33538'376       2015-03-12 
13:51:03.984454      28394'62        2015-03-11 13:50:58.196288


[root@pouta-s04 current]# ceph pg map 3.7d0
osdmap e262813 pg 3.7d0 (3.7d0) -> up [117,118,177] acting [117,118,177]
[root@pouta-s04 current]#


Data is present here , so 1 copy is present out of 3 

[root@pouta-s04 current]# ls -l /var/lib/ceph/osd/ceph-117/current/3.7d0_head/ 
| wc -l
63
[root@pouta-s04 current]#



[root@pouta-s03 ~]#  ls -l /var/lib/ceph/osd/ceph-118/current/3.7d0_head/
total 0
[root@pouta-s03 ~]#


[root@pouta-s01 ceph]# ceph osd find 177
{ "osd": 177,
  "ip": "10.100.50.2:7062\/777799",
  "crush_location": { "host": "pouta-s02",
      "root": "default”}}
[root@pouta-s01 ceph]#

Even directory is not present here 

[root@pouta-s02 ~]#  ls -l /var/lib/ceph/osd/ceph-177/current/3.7d0_head/
ls: cannot access /var/lib/ceph/osd/ceph-177/current/3.7d0_head/: No such file 
or directory
[root@pouta-s02 ~]#


# ceph pg  3.7d0 query http://paste.ubuntu.com/10720107/ 
<http://paste.ubuntu.com/10720107/>


- Karan -

> On 20 Mar 2015, at 22:43, Craig Lewis <cle...@centraldesktop.com> wrote:
> 
> > osdmap e261536: 239 osds: 239 up, 238 in
> 
> Why is that last OSD not IN?  The history you need is probably there.
> 
> Run  ceph pg <pgid> query on some of the stuck PGs.  Look for the 
> recovery_state section.  That should tell you what Ceph needs to complete the 
> recovery.
> 
> 
> If you need more help, post the output of a couple pg queries.
> 
> 
> 
> On Fri, Mar 20, 2015 at 4:22 AM, Karan Singh <karan.si...@csc.fi 
> <mailto:karan.si...@csc.fi>> wrote:
> Hello Guys
> 
> My CEPH cluster lost data and not its not recovering. This problem occurred 
> when Ceph performed recovery when one of the node was down. 
> Now all the nodes are up but Ceph is showing PG as incomplete , unclean , 
> recovering.
> 
> 
> I have tried several things to recover them like , scrub , deep-scrub , pg 
> repair , try changing primary affinity and then scrubbing , 
> osd_pool_default_size etc. BUT NO LUCK
> 
> Could yo please advice , how to recover PG and achieve HEALTH_OK
> 
> # ceph -s
>     cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
>      health HEALTH_WARN 19 pgs incomplete; 3 pgs recovering; 20 pgs stuck 
> inactive; 23 pgs stuck unclean; 2 requests are blocked > 32 sec; recovery 
> 531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%)
>      monmap e3: 3 mons at 
> {xxx=xxxx:6789/0,xxx=xxxx:6789:6789/0,xxx=xxxx:6789:6789/0}, election epoch 
> 1474, quorum 0,1,2 xx,xx,xx
>      osdmap e261536: 239 osds: 239 up, 238 in
>       pgmap v415790: 18432 pgs, 13 pools, 2330 GB data, 319 kobjects
>             20316 GB used, 844 TB / 864 TB avail
>             531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%)
>                    1 creating
>                18409 active+clean
>                    3 active+recovering
>                   19 incomplete
> 
> 
> 
> 
> # ceph pg dump_stuck unclean
> ok
> pg_stat       objects mip     degr    unf     bytes   log     disklog state   
> state_stamp     v       reported        up      up_primary      acting  
> acting_primary  last_scrub      scrub_stamp     last_deep_scrub 
> deep_scrub_stamp
> 10.70 0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.534911      0'0     261536:1015     [153,140,80]    153   
>   [153,140,80]    153     0'0     2015-03-12 17:59:43.275049      0'0     
> 2015-03-09 17:55:58.745662
> 3.dde 68      66      0       66      552861709       297     297     
> incomplete      2015-03-20 12:19:49.584839      33547'297       261536:228352 
>   [174,5,179]     174     [174,5,179]     174     33547'297       2015-03-12 
> 14:19:15.261595      28522'43        2015-03-11 14:19:13.894538
> 5.a2  0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.560756      0'0     261536:897      [214,191,170]   214   
>   [214,191,170]   214     0'0     2015-03-12 17:58:29.257085      0'0     
> 2015-03-09 17:55:07.684377
> 13.1b6        0       0       0       0       0       0       0       
> incomplete      2015-03-20 12:19:49.846253      0'0     261536:1050     
> [0,176,131]     0       [0,176,131]     0       0'0     2015-03-12 
> 18:00:13.286920      0'0     2015-03-09 17:56:18.715208
> 7.25b 16      0       0       0       67108864        16      16      
> incomplete      2015-03-20 12:19:49.639102      27666'16        261536:4777   
>   [194,145,45]    194     [194,145,45]    194     27666'16        2015-03-12 
> 17:59:06.357864      2330'3  2015-03-09 17:55:30.754522
> 5.19  0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.742698      0'0     261536:25410    [212,43,131]    212   
>   [212,43,131]    212     0'0     2015-03-12 13:51:37.777026      0'0     
> 2015-03-11 13:51:35.406246
> 3.a2f 0       0       0       0       0       0       0       creating        
> 2015-03-20 12:42:15.586372      0'0     0:0     []      -1      []      -1    
>   0'0     0.000000        0'0     0.000000
> 7.298 0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.566966      0'0     261536:900      [187,95,225]    187   
>   [187,95,225]    187     27666'13        2015-03-12 17:59:10.308423      
> 2330'4  2015-03-09 17:55:35.750109
> 3.a5a 77      87      261     87      623902741       325     325     
> active+recovering       2015-03-20 10:54:57.443670      33569'325       
> 261536:182464   [150,149,181]   150     [150,149,181]   150     33569'325     
>   2015-03-12 13:58:05.813966      28433'44        2015-03-11 13:57:53.909795
> 1.1e7 0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.610547      0'0     261536:772      [175,182]       175   
>   [175,182]       175     0'0     2015-03-12 17:55:45.203232      0'0     
> 2015-03-09 17:53:49.694822
> 3.774 79      0       0       0       645136397       339     339     
> incomplete      2015-03-20 12:19:49.821708      33570'339       261536:166857 
>   [162,39,161]    162     [162,39,161]    162     33570'339       2015-03-12 
> 14:49:03.869447      2226'2  2015-03-09 13:46:49.783950
> 3.7d0 78      0       0       0       609222686       376     376     
> incomplete      2015-03-20 12:19:49.534004      33538'376       261536:182810 
>   [117,118,177]   117     [117,118,177]   117     33538'376       2015-03-12 
> 13:51:03.984454      28394'62        2015-03-11 13:50:58.196288
> 3.d60 0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.647196      0'0     261536:833      [154,172,1]     154   
>   [154,172,1]     154     33552'321       2015-03-12 13:44:43.502907      
> 28356'39        2015-03-11 13:44:41.663482
> 4.1fc 0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.610103      0'0     261536:1069     [70,179,58]     70    
>   [70,179,58]     70      0'0     2015-03-12 17:58:19.254170      0'0     
> 2015-03-09 17:54:55.720479
> 3.e02 72      0       0       0       585105425       304     304     
> incomplete      2015-03-20 12:19:49.564768      33568'304       261536:167428 
>   [15,102,147]    15      [15,102,147]    15      33568'304       2015-03-16 
> 10:04:19.894789      2246'4  2015-03-09 11:43:44.176331
> 8.1d4 0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.614727      0'0     261536:19611    [126,43,174]    126   
>   [126,43,174]    126     0'0     2015-03-12 14:34:35.258338      0'0     
> 2015-03-12 14:34:35.258338
> 4.2f4 0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.595109      0'0     261536:113791   [181,186,13]    181   
>   [181,186,13]    181     0'0     2015-03-12 14:59:03.529264      0'0     
> 2015-03-09 13:46:40.601301
> 3.52c 65      23      69      23      543162368       290     290     
> active+recovering       2015-03-20 10:51:43.664734      33553'290       
> 261536:8431     [212,100,219]   212     [212,100,219]   212     33553'290     
>   2015-03-13 11:44:26.396514      29686'103       2015-03-11 17:18:33.452616
> 3.e5a 76      70      0       0       623902741       325     325     
> incomplete      2015-03-20 12:19:49.552071      33569'325       261536:71248  
>   [97,22,62]      97      [97,22,62]      97      33569'325       2015-03-12 
> 13:58:05.813966      28433'44        2015-03-11 13:57:53.909795
> 8.3a0 0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.615728      0'0     261536:173184   [62,14,178]     62    
>   [62,14,178]     62      0'0     2015-03-12 13:52:44.546418      0'0     
> 2015-03-12 13:52:44.546418
> 3.24e 0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.591282      0'0     261536:1026     [103,14,90]     103   
>   [103,14,90]     103     33556'272       2015-03-13 11:44:41.263725      
> 2327'4  2015-03-09 17:54:43.675552
> 5.f7  0       0       0       0       0       0       0       incomplete      
> 2015-03-20 12:19:49.667823      0'0     261536:853      [73,44,123]     73    
>   [73,44,123]     73      0'0     2015-03-12 17:58:30.257371      0'0     
> 2015-03-09 17:55:11.725629
> 3.ae8 77      67      201     67      624427024       342     342     
> active+recovering       2015-03-20 10:50:01.693979      33516'342       
> 261536:149258   [122,144,218]   122     [122,144,218]   122     33516'342     
>   2015-03-12 17:11:01.899062      29638'134       2015-03-11 17:10:59.966372
> #
> 
> 
> PG data is there on multiple OSD’s but Ceph is not recovering the PG , For 
> Example
> 
> # ceph pg map 7.25b
> osdmap e261536 pg 7.25b (7.25b) -> up [194,145,45] acting [194,145,45]
> 
> 
> # ls -l /var/lib/ceph/osd/ceph-194/current/7.25b_head | wc -l
> 17
> 
> # ls -l /var/lib/ceph/osd/ceph-145/current/7.25b_head | wc -l
> 0
> #
> 
> # ls -l /var/lib/ceph/osd/ceph-45/current/7.25b_head | wc -l
> 17
> 
> 
> 
> 
> 
> Some of the PG are completely lost , i.e they don’t have any data . For 
> example 
> 
> # ceph pg map 10.70
> osdmap e261536 pg 10.70 (10.70) -> up [153,140,80] acting [153,140,80]
> 
> 
> # ls -l /var/lib/ceph/osd/ceph-140/current/10.70_head | wc -l
> 0
> 
> # ls -l /var/lib/ceph/osd/ceph-153/current/10.70_head | wc -l
> 0
> 
> # ls -l /var/lib/ceph/osd/ceph-80/current/10.70_head | wc -l
> 0
> 
> 
> 
> - Karan -
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to