Re: [ceph-users] Troubleshooting incomplete PG's

nokia ceph Tue, 04 Apr 2017 03:21:18 -0700

Hello Sage and Brad,

Many thanks for the information


>incomplete PGs can be extracted from the drive if the bad sector(s) don't
>happen to affect those pgs.  The ceph-objectstore-tool --op export command
>can be used for this (extract it from the affected drive and add it to
>some other osd).


==
#ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 1.fs1
--op export --file /tmp/test
Exporting 1.fs1
Export successful

#ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1  --op import
--file /tmp/test
Importing pgid 1.fs1
Import successful
==

I will try this for next issue reoccurance.

Need your suggestion to fixing the unfound errors which happened on other
environment. v11.2.0 , bluestore, 4+1

===
1 active+degraded, 8191 active+clean; 29494 GB data, 39323 GB used, 1180 TB
/ 1218 TB avail; 2/66917305 objects degraded (0.000%); 1/13383461 unfound
(0.000%)
===

===
pg 1.93f is active+degraded, acting [206,99,11,290,169], 1 unfound
===

What we tried...

#Restart all the associated OSD's for that PG's 206,99,11,290,169
#Here all the OSD's are up and running state
#ceph pg repair 1.93f
#ceph pg deep-scrub 1.93f

At last
#ceph pg 1.93f mark_unfound_lost delete   { data loss }


Need your views on this, to how to clear the unfound issues without data
loss.


Thanks
Jayaram



On Mon, Apr 3, 2017 at 6:50 PM, Sage Weil <sw...@redhat.com> wrote:

> On Fri, 31 Mar 2017, nokia ceph wrote:
> > Hello Brad,
> > Many thanks of the info :)
> >
> > ENV:-- Kracken - bluestore - EC 4+1 - 5 node cluster : RHEL7
> >
> > What is the status of the down+out osd? Only one osd osd.6 down and out
> from
> > cluster.
> > What role did/does it play? Mostimportantly, is it osd.6? Yes, due to
> > underlying I/O error issue we removed this device from the cluster.
>
> Is the device completely destroyed or is it only returning errors
> when reading certain data?  It is likely that some (or all) of the
> incomplete PGs can be extracted from the drive if the bad sector(s) don't
> happen to affect those pgs.  The ceph-objectstore-tool --op export command
> can be used for this (extract it from the affected drive and add it to
> some other osd).
>
> > I put this parameter " osd_find_best_info_ignore_history_les = true" in
> > ceph.conf, and find those 22 PG's were changed to "down+remapped" . Now
> all
> > are reverted to "remapped+incomplete" state.
>
> This is usually not a great idea unless you're out of options, by the way!
>
> > #ceph pg stat 2> /dev/null
> > v2731828: 4096 pgs: 1 incomplete, 21 remapped+incomplete, 4074
> active+clean;
> > 268 TB data, 371 TB used, 267 TB / 638 TB avail
> >
> > ## ceph -s
> > 2017-03-30 19:02:14.350242 7f8b0415f700 -1 WARNING: the following
> dangerous
> > and experimental features are enabled: bluestore,rocksdb
> > 2017-03-30 19:02:14.366545 7f8b0415f700 -1 WARNING: the following
> dangerous
> > and experimental features are enabled: bluestore,rocksdb
> >     cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108
> >      health HEALTH_ERR
> >             22 pgs are stuck inactive for more than 300 seconds
> >             22 pgs incomplete
> >             22 pgs stuck inactive
> >             22 pgs stuck unclean
> >      monmap e2: 5 mons at{au-adelaide=10.50.21.24:
> 6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=
> > 10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,
> au-sydney=10.50.21.20:67
> > 89/0}
> >             election epoch 180, quorum 0,1,2,3,4
> > au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide
> >         mgr active: au-adelaide
> >      osdmap e6506: 117 osds: 117 up, 117 in; 21 remapped pgs
> >             flags sortbitwise,require_jewel_osds,require_kraken_osds
> >       pgmap v2731828: 4096 pgs, 1 pools, 268 TB data, 197 Mobjects
> >             371 TB used, 267 TB / 638 TB avail
> >                 4074 active+clean
> >                   21 remapped+incomplete
> >                    1 incomplete
> >
> >
> > ## ceph osd dump 2>/dev/null | grep cdvr
> > pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
> > rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags
> > hashpspool,nodeep-scrub stripe_width 65536
> >
> > Inspecting affected PG 1.e4b
> >
> > # ceph pg dump 2> /dev/null | grep 1.e4b
> > 1.e4b     50832                  0        0         0       0 73013340821
> > 10006    10006 remapped+incomplete 2017-03-30 14:14:26.297098 3844'161662
> >  6506:325748 [113,66,15,73,103]        113  [NONE,NONE,NONE,73,NONE]
>
> >     73 1643'139486 2017-03-21 04:56:16.683953             0'0 2017-02-21
> > 10:33:50.012922
> >
> > When I trigger below command.
> >
> > #ceph pg force_create_pg 1.e4b
> > pg 1.e4b now creating, ok
> >
> > As it went to creating state, no change after that. Can you explain why
> this
> > PG showing null values after triggering "force_create_pg",?
> >
> > ]# ceph pg dump 2> /dev/null | grep 1.e4b
> > 1.e4b         0                  0        0         0       0
> 0
> >   0        0            creating 2017-03-30 19:07:00.982178         0'0
>
> >      0:0                 []         -1                        []
>
> > -1         0'0                   0.000000             0'0
>
> > 0.000000
>
> CRUSH isn't mapping the PG to any OSDs, so there is nowhere to create it,
> it seems?  What does 'ceph pg map <pgid>' show?
>
> > Then I triggered below command
> >
> > # ceph pg  repair 1.e4b
> > Error EAGAIN: pg 1.e4b has no primary osd  --<<
> >
> > Could you please provide answer for below queries.
> >
> > 1. How to fix this "incomplete+remapped" PG issue, here all OSD's were up
> > and running and affected OSD marked out and removed from the cluster.
>
> To recover the data, you need to find surviving shards of the PG.
> ceph-objectstore-tool on the "failed" disk is one option, but since this
> is a 4+2 code there should have been another copy that got lost along the
> line... do you know where it is?
>
> > 2. Will reduce min_size helps? currently it set to 4. Could you please
> > explain what is the impact if we reduce min_size for the current config
> EC
> > 4+1
>
> You can't reduce it below 4 since it's a 4+2 code.  By default we set it
> as 5 (k+1) so that you won't write new data to the PG if a single
> additional failure could lead you to lose those writes.
>
> > 3. Is there any procedure to safely remove an affected PG? As per my
> > understanding I'm aware about this command.
> >
> > ===
> > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph --pgid 1.e4b
> --op
> > remove
> > ===
> >
> > Awaiting for your suggestions to proceed.
>
> If you don't need the data and just want to recreate the pg empty, then
> the procedure is to remove any surviving fragments and then do
> force_create_pg.  It looks like you need to figure out why the pgid isn't
> mapping to any OSDs first, though.
>
> sage
>
>
>
>  >
> > Thanks
> >
> >
> >
> >
> >
> >
> > On Thu, Mar 30, 2017 at 7:32 AM, Brad Hubbard <bhubb...@redhat.com>
> wrote:
> >
> >
> >       On Thu, Mar 30, 2017 at 4:53 AM, nokia ceph
> >       <nokiacephus...@gmail.com> wrote:
> >       > Hello,
> >       >
> >       > Env:-
> >       > 5 node, EC 4+1 bluestore kraken v11.2.0 , RHEL7.2
> >       >
> >       > As part of our resillency testing with kraken bluestore, we
> >       face more PG's
> >       > were in incomplete+remapped state. We tried to repair each PG
> >       using "ceph pg
> >       > repair <pgid>" still no luck. Then we planned to remove
> >       incomplete PG's
> >       > using below procedure.
> >       >
> >       >
> >       > #ceph health detail | grep  1.e4b
> >       > pg 1.e4b is remapped+incomplete, acting
> >       [2147483647,66,15,73,2147483647]
> >       > (reducing pool cdvr_ec min_size from 4 may help; search
> >       ceph.com/docs for
> >       > 'incomplete')
> >
> >       "Incomplete Ceph detects that a placement group is missing
> >       information about
> >       writes that may have occurred, or does not have any healthy
> >       copies. If you see
> >       this state, try to start any failed OSDs that may contain the
> >       needed
> >       information."
> >
> >       >
> >       > Here we shutdown the OSD's 66,15 and 73 then proceeded with
> >       below operation.
> >       >
> >       > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135
> >       --op list-pgs
> >       > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135
> >       --pgid 1.e4b
> >       > --op remove
> >       >
> >       > Please confirm that we are following the correct procedure to
> >       removal of
> >       > PG's
> >
> >       There are multiple threads about that on this very list "pgs
> >       stuck inactive"
> >       recently for example.
> >
> >       >
> >       > #ceph pg stat
> >       > v2724830: 4096 pgs: 1 active+clean+scrubbing+deep+repair, 1
> >       down+remapped,
> >       > 21 remapped+incomplete, 4073 active+clean; 268 TB data, 371 TB
> >       used, 267 TB
> >       > / 638 TB avail
> >       >
> >       > # ceph -s
> >       > 2017-03-29 18:23:44.288508 7f8c2b8e5700 -1 WARNING: the
> >       following dangerous
> >       > and experimental features are enabled: bluestore,rocksdb
> >       > 2017-03-29 18:23:44.304692 7f8c2b8e5700 -1 WARNING: the
> >       following dangerous
> >       > and experimental features are enabled: bluestore,rocksdb
> >       >     cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108
> >       >      health HEALTH_ERR
> >       >             22 pgs are stuck inactive for more than 300
> >       seconds
> >       >             1 pgs down
> >       >             21 pgs incomplete
> >       >             1 pgs repair
> >       >             22 pgs stuck inactive
> >       >             22 pgs stuck unclean
> >       >      monmap e2: 5 mons at
> >       >{au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.
> 22:6789/0,au-canberra=
> > 10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,
> au-sydney=10.50.21.20:67
> >       89/0}
> >       >             election epoch 172, quorum 0,1,2,3,4
> >       > au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide
> >       >         mgr active: au-brisbane
> >       >      osdmap e6284: 118 osds: 117 up, 117 in; 22 remapped pgs
> >
> >       What is the status of the down+out osd? What role did/does it
> >       play? Most
> >       importantly, is it osd.6?
> >
> >       >             flags
> >       sortbitwise,require_jewel_osds,require_kraken_osds
> >       >       pgmap v2724830: 4096 pgs, 1 pools, 268 TB data, 197
> >       Mobjects
> >       >             371 TB used, 267 TB / 638 TB avail
> >       >                 4073 active+clean
> >       >                   21 remapped+incomplete
> >       >                    1 down+remapped
> >       >                    1 active+clean+scrubbing+deep+repair
> >       >
> >       >
> >       > #ceph osd dump | grep pool
> >       > pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1
> >       object_hash
> >       > rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags
> >       > hashpspool,nodeep-scrub stripe_width 65536
> >       >
> >       >
> >       >
> >       > Can you please suggest is there any way to wipe out these
> >       incomplete PG's.
> >
> >       See the thread previously mentioned. Take note of the
> >       force_create_pg step.
> >
> >       > Why ceph pg repair failed in this scenerio?
> >       > How to recover incomplete PG's to active state.
> >       >
> >       > pg query for the affected PG ended with this error. Can you
> >       please explain
> >       > what is meant by this ?
> >       > ---
> >       >                 "15(2)",
> >       >                 "66(1)",
> >       >                 "73(3)",
> >       >                 "103(4)",
> >       >                 "113(0)"
> >       >             ],
> >       >             "down_osds_we_would_probe": [
> >       >                 6
> >       >             ],
> >       >             "peering_blocked_by": [],
> >       >             "peering_blocked_by_detail": [
> >       >                 {
> >       >                     "detail":
> >       "peering_blocked_by_history_les_bound"
> >       >                 }
> >       > ----
> >
> >       During multiple intervals osd 6 was in the up/acting set, for
> >       example;
> >
> >                       {
> >                           "first": 1608,
> >                           "last": 1645,
> >                           "maybe_went_rw": 1,
> >                           "up": [
> >                               113,
> >                               6,
> >                               15,
> >                               73,
> >                               103
> >                           ],
> >                           "acting": [
> >                               113,
> >                               6,
> >                               15,
> >                               73,
> >                               103
> >                           ],
> >                           "primary": 113,
> >                           "up_primary": 113
> >                       },
> >
> >       Because we may have gone rw during that interval we need to
> >       query it and it is blocking progress.
> >
> >                   "blocked_by": [
> >                       6
> >                   ],
> >
> >       Setting osd_find_best_info_ignore_history_les to true may help
> >       but then you may
> >       need to mark the missing OSD lost or perform some other trickery
> >       (and this . I
> >       suspect your min_size is too low, especially for a cluster of
> >       this size, but EC
> >       is not an area I know extensively so I can't say definitively.
> >       Some of your
> >       questions may be better suited to the ceph-devel mailing list by
> >       the way.
> >
> >       >
> >       > Attaching "ceph pg 1.e4b query > /tmp/1.e4b-pg.txt" file with
> >       this mail.
> >       >
> >       > Thanks
> >       >
> >       > _______________________________________________
> >       > ceph-users mailing list
> >       > ceph-users@lists.ceph.com
> >       > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >       >
> >
> >
> >
> >       --
> >       Cheers,
> >       Brad
> >
> >
> >
> >
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Troubleshooting incomplete PG's

Reply via email to