Hello Sage and Brad, Many thanks for the information
>incomplete PGs can be extracted from the drive if the bad sector(s) don't >happen to affect those pgs. The ceph-objectstore-tool --op export command >can be used for this (extract it from the affected drive and add it to >some other osd). == #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 1.fs1 --op export --file /tmp/test Exporting 1.fs1 Export successful #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1 --op import --file /tmp/test Importing pgid 1.fs1 Import successful == I will try this for next issue reoccurance. Need your suggestion to fixing the unfound errors which happened on other environment. v11.2.0 , bluestore, 4+1 === 1 active+degraded, 8191 active+clean; 29494 GB data, 39323 GB used, 1180 TB / 1218 TB avail; 2/66917305 objects degraded (0.000%); 1/13383461 unfound (0.000%) === === pg 1.93f is active+degraded, acting [206,99,11,290,169], 1 unfound === What we tried... #Restart all the associated OSD's for that PG's 206,99,11,290,169 #Here all the OSD's are up and running state #ceph pg repair 1.93f #ceph pg deep-scrub 1.93f At last #ceph pg 1.93f mark_unfound_lost delete { data loss } Need your views on this, to how to clear the unfound issues without data loss. Thanks Jayaram On Mon, Apr 3, 2017 at 6:50 PM, Sage Weil <sw...@redhat.com> wrote: > On Fri, 31 Mar 2017, nokia ceph wrote: > > Hello Brad, > > Many thanks of the info :) > > > > ENV:-- Kracken - bluestore - EC 4+1 - 5 node cluster : RHEL7 > > > > What is the status of the down+out osd? Only one osd osd.6 down and out > from > > cluster. > > What role did/does it play? Mostimportantly, is it osd.6? Yes, due to > > underlying I/O error issue we removed this device from the cluster. > > Is the device completely destroyed or is it only returning errors > when reading certain data? It is likely that some (or all) of the > incomplete PGs can be extracted from the drive if the bad sector(s) don't > happen to affect those pgs. The ceph-objectstore-tool --op export command > can be used for this (extract it from the affected drive and add it to > some other osd). > > > I put this parameter " osd_find_best_info_ignore_history_les = true" in > > ceph.conf, and find those 22 PG's were changed to "down+remapped" . Now > all > > are reverted to "remapped+incomplete" state. > > This is usually not a great idea unless you're out of options, by the way! > > > #ceph pg stat 2> /dev/null > > v2731828: 4096 pgs: 1 incomplete, 21 remapped+incomplete, 4074 > active+clean; > > 268 TB data, 371 TB used, 267 TB / 638 TB avail > > > > ## ceph -s > > 2017-03-30 19:02:14.350242 7f8b0415f700 -1 WARNING: the following > dangerous > > and experimental features are enabled: bluestore,rocksdb > > 2017-03-30 19:02:14.366545 7f8b0415f700 -1 WARNING: the following > dangerous > > and experimental features are enabled: bluestore,rocksdb > > cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108 > > health HEALTH_ERR > > 22 pgs are stuck inactive for more than 300 seconds > > 22 pgs incomplete > > 22 pgs stuck inactive > > 22 pgs stuck unclean > > monmap e2: 5 mons at{au-adelaide=10.50.21.24: > 6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra= > > 10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0, > au-sydney=10.50.21.20:67 > > 89/0} > > election epoch 180, quorum 0,1,2,3,4 > > au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide > > mgr active: au-adelaide > > osdmap e6506: 117 osds: 117 up, 117 in; 21 remapped pgs > > flags sortbitwise,require_jewel_osds,require_kraken_osds > > pgmap v2731828: 4096 pgs, 1 pools, 268 TB data, 197 Mobjects > > 371 TB used, 267 TB / 638 TB avail > > 4074 active+clean > > 21 remapped+incomplete > > 1 incomplete > > > > > > ## ceph osd dump 2>/dev/null | grep cdvr > > pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash > > rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags > > hashpspool,nodeep-scrub stripe_width 65536 > > > > Inspecting affected PG 1.e4b > > > > # ceph pg dump 2> /dev/null | grep 1.e4b > > 1.e4b 50832 0 0 0 0 73013340821 > > 10006 10006 remapped+incomplete 2017-03-30 14:14:26.297098 3844'161662 > > 6506:325748 [113,66,15,73,103] 113 [NONE,NONE,NONE,73,NONE] > > > 73 1643'139486 2017-03-21 04:56:16.683953 0'0 2017-02-21 > > 10:33:50.012922 > > > > When I trigger below command. > > > > #ceph pg force_create_pg 1.e4b > > pg 1.e4b now creating, ok > > > > As it went to creating state, no change after that. Can you explain why > this > > PG showing null values after triggering "force_create_pg",? > > > > ]# ceph pg dump 2> /dev/null | grep 1.e4b > > 1.e4b 0 0 0 0 0 > 0 > > 0 0 creating 2017-03-30 19:07:00.982178 0'0 > > > 0:0 [] -1 [] > > > -1 0'0 0.000000 0'0 > > > 0.000000 > > CRUSH isn't mapping the PG to any OSDs, so there is nowhere to create it, > it seems? What does 'ceph pg map <pgid>' show? > > > Then I triggered below command > > > > # ceph pg repair 1.e4b > > Error EAGAIN: pg 1.e4b has no primary osd --<< > > > > Could you please provide answer for below queries. > > > > 1. How to fix this "incomplete+remapped" PG issue, here all OSD's were up > > and running and affected OSD marked out and removed from the cluster. > > To recover the data, you need to find surviving shards of the PG. > ceph-objectstore-tool on the "failed" disk is one option, but since this > is a 4+2 code there should have been another copy that got lost along the > line... do you know where it is? > > > 2. Will reduce min_size helps? currently it set to 4. Could you please > > explain what is the impact if we reduce min_size for the current config > EC > > 4+1 > > You can't reduce it below 4 since it's a 4+2 code. By default we set it > as 5 (k+1) so that you won't write new data to the PG if a single > additional failure could lead you to lose those writes. > > > 3. Is there any procedure to safely remove an affected PG? As per my > > understanding I'm aware about this command. > > > > === > > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph --pgid 1.e4b > --op > > remove > > === > > > > Awaiting for your suggestions to proceed. > > If you don't need the data and just want to recreate the pg empty, then > the procedure is to remove any surviving fragments and then do > force_create_pg. It looks like you need to figure out why the pgid isn't > mapping to any OSDs first, though. > > sage > > > > > > > Thanks > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 7:32 AM, Brad Hubbard <bhubb...@redhat.com> > wrote: > > > > > > On Thu, Mar 30, 2017 at 4:53 AM, nokia ceph > > <nokiacephus...@gmail.com> wrote: > > > Hello, > > > > > > Env:- > > > 5 node, EC 4+1 bluestore kraken v11.2.0 , RHEL7.2 > > > > > > As part of our resillency testing with kraken bluestore, we > > face more PG's > > > were in incomplete+remapped state. We tried to repair each PG > > using "ceph pg > > > repair <pgid>" still no luck. Then we planned to remove > > incomplete PG's > > > using below procedure. > > > > > > > > > #ceph health detail | grep 1.e4b > > > pg 1.e4b is remapped+incomplete, acting > > [2147483647,66,15,73,2147483647] > > > (reducing pool cdvr_ec min_size from 4 may help; search > > ceph.com/docs for > > > 'incomplete') > > > > "Incomplete Ceph detects that a placement group is missing > > information about > > writes that may have occurred, or does not have any healthy > > copies. If you see > > this state, try to start any failed OSDs that may contain the > > needed > > information." > > > > > > > > Here we shutdown the OSD's 66,15 and 73 then proceeded with > > below operation. > > > > > > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135 > > --op list-pgs > > > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135 > > --pgid 1.e4b > > > --op remove > > > > > > Please confirm that we are following the correct procedure to > > removal of > > > PG's > > > > There are multiple threads about that on this very list "pgs > > stuck inactive" > > recently for example. > > > > > > > > #ceph pg stat > > > v2724830: 4096 pgs: 1 active+clean+scrubbing+deep+repair, 1 > > down+remapped, > > > 21 remapped+incomplete, 4073 active+clean; 268 TB data, 371 TB > > used, 267 TB > > > / 638 TB avail > > > > > > # ceph -s > > > 2017-03-29 18:23:44.288508 7f8c2b8e5700 -1 WARNING: the > > following dangerous > > > and experimental features are enabled: bluestore,rocksdb > > > 2017-03-29 18:23:44.304692 7f8c2b8e5700 -1 WARNING: the > > following dangerous > > > and experimental features are enabled: bluestore,rocksdb > > > cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108 > > > health HEALTH_ERR > > > 22 pgs are stuck inactive for more than 300 > > seconds > > > 1 pgs down > > > 21 pgs incomplete > > > 1 pgs repair > > > 22 pgs stuck inactive > > > 22 pgs stuck unclean > > > monmap e2: 5 mons at > > >{au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21. > 22:6789/0,au-canberra= > > 10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0, > au-sydney=10.50.21.20:67 > > 89/0} > > > election epoch 172, quorum 0,1,2,3,4 > > > au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide > > > mgr active: au-brisbane > > > osdmap e6284: 118 osds: 117 up, 117 in; 22 remapped pgs > > > > What is the status of the down+out osd? What role did/does it > > play? Most > > importantly, is it osd.6? > > > > > flags > > sortbitwise,require_jewel_osds,require_kraken_osds > > > pgmap v2724830: 4096 pgs, 1 pools, 268 TB data, 197 > > Mobjects > > > 371 TB used, 267 TB / 638 TB avail > > > 4073 active+clean > > > 21 remapped+incomplete > > > 1 down+remapped > > > 1 active+clean+scrubbing+deep+repair > > > > > > > > > #ceph osd dump | grep pool > > > pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 > > object_hash > > > rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags > > > hashpspool,nodeep-scrub stripe_width 65536 > > > > > > > > > > > > Can you please suggest is there any way to wipe out these > > incomplete PG's. > > > > See the thread previously mentioned. Take note of the > > force_create_pg step. > > > > > Why ceph pg repair failed in this scenerio? > > > How to recover incomplete PG's to active state. > > > > > > pg query for the affected PG ended with this error. Can you > > please explain > > > what is meant by this ? > > > --- > > > "15(2)", > > > "66(1)", > > > "73(3)", > > > "103(4)", > > > "113(0)" > > > ], > > > "down_osds_we_would_probe": [ > > > 6 > > > ], > > > "peering_blocked_by": [], > > > "peering_blocked_by_detail": [ > > > { > > > "detail": > > "peering_blocked_by_history_les_bound" > > > } > > > ---- > > > > During multiple intervals osd 6 was in the up/acting set, for > > example; > > > > { > > "first": 1608, > > "last": 1645, > > "maybe_went_rw": 1, > > "up": [ > > 113, > > 6, > > 15, > > 73, > > 103 > > ], > > "acting": [ > > 113, > > 6, > > 15, > > 73, > > 103 > > ], > > "primary": 113, > > "up_primary": 113 > > }, > > > > Because we may have gone rw during that interval we need to > > query it and it is blocking progress. > > > > "blocked_by": [ > > 6 > > ], > > > > Setting osd_find_best_info_ignore_history_les to true may help > > but then you may > > need to mark the missing OSD lost or perform some other trickery > > (and this . I > > suspect your min_size is too low, especially for a cluster of > > this size, but EC > > is not an area I know extensively so I can't say definitively. > > Some of your > > questions may be better suited to the ceph-devel mailing list by > > the way. > > > > > > > > Attaching "ceph pg 1.e4b query > /tmp/1.e4b-pg.txt" file with > > this mail. > > > > > > Thanks > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > -- > > Cheers, > > Brad > > > > > > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com